I’d recommend Baseball Hacks to anyone who has ever hung around here (or other baseball analysis sites) and thought “I wish I could get detailed stats like those” but didn’t know where to start. If you want not just to digest baseball research but check it and tinker with it yourself, and you’re willing to get your hands dirty, this is your book. And the dirtier you’re willing to get, the more you can get out of it.
Here’s my quick-and-dirty summary of the book
Chapter 1, Basics of Baseball.
Baseball information is on the internet! Whee!
Chapter 2, Baseball Games from Past Years
This is good stuff: getting yourself databases with all kinds of past game stats, hooking it up, querying it… and this is where we start to get into the real work: using Perl makes an appearance. Still, it’s almost all database-and-SQL stuff, and isn’t that heavy – if you’re not scared of the word ‘database’ you’ll be fine.
Chapter 3, Stats from the Current Season
Noooow it starts to get heavy. Hack 25 is “Spider Baseball Sites for Data” for instance. Soon it’s into building and keeping current year stats updated.
Chapter 4, Visualize Baseball Statistics
This is cool stuff, and instead of being programming/technical heavy, it’s much more into statistical analysis and visualization.
Chapter 5, Formulas
How to calculate a bunch of stats.
Chapter 6, Sabermetric Thinking
This is where you’d think things get interesting, and that’s kinda true. Here it’s about how to use the data you’re getting to look for good stuff. I disagree with how he goes about some of it (Hack 64, on clutch hitting, specifically) but it is good to see what kind of things the data can offer you.
Then there’s some fantasy stuff, which I’m sure would be great if you were interested in using your newfound data to try and find some crazy advantage. I skipped it, because that’s not me at all. And really, when Baseball Prospectus has a pretty good budgeting-and-forecasting thing, it seems a little pointless.
So what can this all get you? If you’re interested in historical baseball stats, and know or are willing to learn a little bit about databases, it’s a nice walkthrough from getting a freely available database of historical stats (The Baseball Archive) and setting it up nicely so you can do cool stuff. From there, well… even I don’t get into the kind of data-scraping that’s in here: I’d rather put up with ESPN’s ads and use their splits, or build it out of Retrosheet box scores the hard way, or whatever. And I’m fairly technical and willing to tinker with this stuff. Some of the more advanced stuff seems geared towards someone with fair technical skills who wants to tinker with both baseball data and with building thier own framework, rather get started in baseball analysis.
I will say that there’s a lot of value in having access to even a nice historical database of raw stats: I find myself pawing around it all the time, looking for interesting stuff that ends up a throwaway reference in a piece here.
So this is a book where if you’re looking to get a lot more technical and want to do a lot more research independently, you’re going to dig it. If you’d just like to be able to baseball-reference-y things, that part’s fairly easy too, and I’ve found it quite rewarding.
However, it’s not about baseball, or really about baseball statistics, or anything. It’s about (as you’d guess from the title), using computers and freely available data to hack stuff together.
Anyway, I hope this helps determine whether it’ll be a good book for you or not. check it out if that sounds interesting.