Evaluating Pitcher Talent
The discussion of what statistics are useful in evaluating a pitcher came up in the game thread, again, last night. This issue comes up quite a bit around here, since I use a lot of non-conventional numbers, and new readers often don’t know what they mean, where to find them, or why they should bother. So, last night, I decided to write something of a primer on why I like to use the statistics that I use, what their usefulness is, and why I don’t really care about things like ERA, WHIP, or batting average against.
All the stats referenced, by the way, can be found at the Hardball Times, and detailed game logs using these numbers can be found at Fangraphs, which are two of the most awesome sites out there right now.
The mainstream tools for evaluating a pitcher’s success and abilities are won-loss record and earned run average, with fantasy baseball players often add WHIP (walks+hits per inning pitched) to the discussion, since it’s one of their categories. These statistics attempt to sum up pitcher effectiveness in total, giving an overview of the totality of his performance with just a few numbers.
I, personally, think they fail in that regard. ERA and WHIP group together a large string of individual events made by multiple players, making it extremely tough to separate out the credit for the pitcher, hitter, or defense. WHIP and ERA tell you there is no difference in an inning where three batters drive the ball to the fence and end up with three long flyouts or an inning where a pitcher strikes out the side. Clearly, theyâ€™re drastically different, but WHIP and ERA fail to account for the actual contributions of the pitcher. So, if the goal is to actually find out how well a pitcher threw, why not look at a micro level, instead of a macro level? That’s what I prefer to do.
For instance, what are the possible events in an at-bat that can occur?
A pitch can be thrown for a ball.
A pitch can be thrown for a strike.
A pitch can be swung at and missed.
The ball can be hit on the ground.
The ball can be hit on a line.
The ball can be hit in the air.
On any given pitch, those are the options. There are a few sub-categories under those options (outfield fly or infield fly, bunt grounder or normal grounder, etcâ€¦), but we can sum up every possible outcome of each pitch with those six options. Those outcomes might lead to wildly different events, but we’ll get to that later.
Which of these six outcomes are positive for the pitcher? Called strike, swinging strike, and groundball.
Which of these six outcomes are positive for the hitter? Called ball, line drive, and flyball.
If we can effectively determine which pitchers maximize their value in the â€œgood outcomesâ€ and minimize their harm in the â€œbad outcomesâ€, we can get a pretty firm grasp on who has pitching talent and who does not. Thankfully, Dave Studeman wrote a fantastic article called “Whats A Batted Ball Worth” in the 2006 Hardball Times Annual, and it includes the following run value chart. This chart will give a context to those good and bad outcome categories:
Line Drive: .356 – in other words, an average line drive is worth 35% of one run.
Non-Intentional Walk: .315
Intentional Walk: .176
Outfield Fly: .035
Infield Fly: -.243
These run values were taken from real life play-by-play data, so this is an actual representation of events, not some theoretic formula. As you can see, a hit-by-pitch is a better event for the offense than a walk, even though they both simply put the batter on first base. Why? Because a hit-by-pitch is pretty much random, and can occur both at times when it is a critical situation and times when it isn’t. A walk, conversely, is far more likely to put a runner on first base in a run scoring situation, lowering it’s run value compared to the HBP.
As you can see, the difference between an outfield fly and a groundball isnâ€™t huge, but its real, and it adds up over the course of the season. This is why, all things equal, a groundball pitcher is better than a flyball pitcher. All things are almost never equal, and flyball pitchers tend to have higher strikeout rates than groundball pitchers, but the theoretical best pitcher alive would be a groundball pitcher, not a flyball pitcher.
Also, bunting = bad.
So, now that we have some understanding of the possible outcomes and their relative value, instead of using statistics like ERA or WHIP that leave out critical information, our best bet is to try to quantify the six potential outcomes, and the events that result from those outcomes as best as we can.
BB% (Walks per Total Batters Faced) does a nice job evaluating how often a pitcher throws the ball in the strike zone. The average walkrate is 8% for a major league pitcher, though the DH makes the AL a higher walk league than the NL. Anything under 5% is tremendous, and anything over 11% is a problem. The Hardball Times publishes BB% and K% in a slightly different manner, calling it BB/G or K/G to make it scale more like the per nine innings numbers people are used to seeing. BB/G (and BB%, its derivitive) is more effective than BB/9 because it accounts for the actual amount of batters faced rather than using a proxy like innings pitched. It’s just more accurate.
K% (Strikeouts per Total Batters Faced) does a decent job evaluating how often a pitcher induces swings and misses or called strikes. 16% is league average, with 20% being terrific and 12% being a problem.
GB% (Groundballs per Balls In Play) does a very good job of telling us how often a pitcher induces a groundball. 42% is league average, and anything over 50% is terrific, with the best sinkerball pitchers posting rates in the 60-65% range, while anything below 35% can be a problem if its not offset with a high strikeout rate.
LD% (Line Drives per Balls In Play) does a very good job of telling us how often a pitcher gives up line drives. 20% is league average, 17% is good, and 23% is a serious problem. Because of the way line drives have been scored by Baseball Info Solutions the past couple of years, this number is hard to use for year to year analysis, and right now, it’s not a very effective tool. We don’t use it very often.
FB% (Flyballs per Balls In Play) does a very good job of telling us how often a pitcher gives up flyballs that leave the infield, and is basically the corollary to GB%. 36% is league average, while 32% is good and 40% could be a problem.
So we have five statistics that cover each of the six possible outcomes pretty effectively. Not perfect, but they do a credible job. They aren’t park adjusted (and yes, parks have an effect on things you might not expect, such as walk rates, strikeout rates, and groundball rates), but they’re pretty close for the majority of cases.
Thanks to the work of guys like Voros McCracken, Tom Tippett, Keith Woolner, and Dave Studeman, we also now know that the result of a particular ball in play is also not very consistent, and is due more to the actions of the hitter than the pitcher. So, when evaluating pitcher’s talent, we need to adjust for outlier type performances on converting outs on balls in play. If a pitcher has a lot of flyballs that are being caught on the warning track, or groundballs that are going right to infielders, thatâ€™s not likely to continue, and we shouldnâ€™t assume that it will.
Not all balls in play are created equal, however, and so when we’re adjusting for outs on balls in play, we need to make sure we’re adjusting back to the type of ball in play the pitcher is giving up, since we’ve noted that they certainly do have control over their groundball or flyball tendencies.
An outfield fly becomes an out 77.7% of the time. A groundball becomes an out 74.8% of the time. A line drive becomes an out only 26.4% of the time, which is why it’s the worst possible outcome for a pitcher. An infield fly becomes an out 98.8% of the time. Because of this, flyball pitchers will post more outs on balls in play than groundball pitchers, and it won’t be a fluke. However, the non-outs that flyball pitchers give up are more harmful, and thus, the quality of the hits against flyball pitchers outweighs the relative lack of quantity. This is shown in the run value chart above, where an average groundball is a positive event for the pitcher and the outfield flyball is not.
Infield flies are automatic outs, essentially, so it’s best to separate them from outfield flies for analysis like this. Since evidence has shown that pitchers don’t have a strong year to year control over their infield fly percentage, however, when evaluating true talent levels, it’s best to assume something like a normal infield fly percentage for a pitcher, rather than the one he’s posting at the moment.
Two other big factors that weâ€™ve identified that can have a great effect on run scoring are home run rates and stranding runners. In general, flyball pitchers give up more home runs than groundball pitchers, which is why a groundball is a positive event for the pitcher and a flyball is not.
Weâ€™ve seen very little evidence that major league pitchers have significant control over how often their flyballs go over the wall, so occassionally youâ€™ll see a wild swing in performance that is not indicative of a players true talent level, simply because a pitcher is having more or less flyballs go over the wall than should be expected. Felix Hernandez in April and May of this year was a great example of a guy who allowed a lot of home runs per flyball, and that rate has steadily dropped as the season wore on. The average major league pitcher gives up home runs in about 11-12% of his outfield flies – significant variation from that is probably not an indicator of talent for a major league quality pitcher.
Stranding runners is also a big key, and a bit of a different animal. Naturally, good pitchers will strand more runners than bad pitchers. Since theyâ€™re good pitchers, theyâ€™re more likely to create an out in any situation, including with men on base, than if they werenâ€™t a good pitcher. While the league average Left on Base Percentage is 70%, the bad pitchers often live in the low-to-mid-60% range, and the good pitchers live in the mid-to-high-70% range.
However, itâ€™s not uncommon for bad pitchers to have flukily high strand rates that significantly lower than ERAs, and vice versa. Jarrod Washburnâ€™s 2005 ERA was almost completely due to his high strand rate, as he posted the highest LOB% in the American League. That hasnâ€™t held true in 2006, and weâ€™ve seen his ERA rise a full run because of it. So, when you find a pitcher who is stranding runners at an unexpected rate when compared to his talent derived by BB%, K%, and GB%, it is prudent to expect that rate to regress back towards a more normal rate in the future.
So, looking at this breakdown, we see value in BB%, K%, GB%, HR/FB%, and LOB%. Those five statistics will tell you almost everything you need to know about what goes into why a pitcher is performing like he is, and all these statistics are easily available at The Hardball Times. Thereâ€™s nothing that ERA or WHIP will tell you that those component statistics do not, but ERA and WHIP certainly leave a lot of the underlying information out.
However, it is understandable that people want one number that sums up pitcher performance. If you really prefer to not look through the prism of BB/K/GB/HR-FB/LOB percentages, you can always use FIP, or Fielding Indpendent Pitching (which I often call Fielding Independent ERA, since its scaled to look like ERA), which gives you an expected ERA for a pitcher based on his walk, strikeout, and home run rates. FIP isn’t perfect, either – it assumes that HR/FB is indeed a skill, and it assumes that all pitchers are equal at stranding runners, neither of which are true, but it’s better than ERA for summing up a pitcher’s total contributions to run prevention.
If you want to get really crazy, you can even use Expected FIP, or xFIP, which substitutes the league average home run per fly ball rate for the pitcher’s actual home run rate, giving a more accurate picture of how we’ll expect a pitcher to perform going forward as his HR/FB rate regresses towards the mean.
As I said, both FIP and xFIP have flaws, especially when it comes to evaluating relief pitchers, but if you’re insistent on using one number to sum up a pitcher’s contribution to run prevention, those would be your best bet.
In this age of wonderful information, thereâ€™s just no reason to use ERA and WHIP for serious analysis of a pitcherâ€™s ability. We have better tools at our disposal. We’re doing ourselves an injustice if we continue to lean on inferior information.