If you’ve been hanging around the blog for any length of time, you’ve probably come to realize that we like numbers. They give us a better way to evaluate what we think we saw, and they compensate for our internal bias’. Since a lot of baseball is essentially a set of isolated individual plays, it’s fairly easy to evaluate a player’s value to the team through their statistics, if you know which ones to use.
However, defensive evaluations have always been elusive to the statistical community. The numbers that were recorded, such as fielding percentage, were basically useless information, more misleading than anything else. For years, the players who have made the most memorable plays have been regarded as the elite defensive players simply because we’ve had no real objective standard of how to evaluate defense.
In the past 3-4 years, however, we’ve seen significant steps forward in the realm of defensive statistics. People interested in understanding the game better have begun purchasing play-by-play data that gives them far more information than we’ve had available previously, and have used that specific information to create systems that do a much better job of figuring out just how much value a player’s defense adds to his team. However, the age of defensive statistical analysis is still in its infancy, and as such, there is not a consensus system that is correct, or established as the industry standard. There are several systems built on solid theories that evaluate different parts of defensive prowess, and sometimes, these systems give widely contradictory results. So, what do we do then, if two systems, both well designed, can’t agree?
At this point, my preference is to take a prism perspective. All of the systems have strengths, and all have flaws. So I’d rather not take any of them at face value, but instead develop a general idea of a player’s abilities based upon as much good input as I can get. So, since it’s been requested and there’s nothing going on in Mariner-land, here’s an overview, with links for those interested, for the defensive statistics that I lend some credence to, and how I attempt to put them together to get an overall idea of a player’s contributions with the glove.
The most widely accepted system is Mitchel Lichtman’s UZR. It was initially introduced in 2003 in two articles over at Baseball Primer. UZR numbers for 2000-2003 were then posted on TangoTiger’s website, where they can still be found. Complete numbers for 2004 and 2005 are not available, as Lichtman was hired by the St. Louis Cardinals and his work became proprietary data for their club. He has released some UZR numbers for the past two years in different discussion threads at Baseball Think Factory, but for the most part, current UZR data is no longer public information.
After UZR, the best system is likely David Pinto’s PMR, or Probablisitic Model of Range. He has published his data at his Baseball Musings blog, and you can read the explanation of the system here. Pinto’s PMR is similar to Lichtman’s UZR, as they’re based on the same principles and both use proprietary play-by-play data, but the bonus is that Pinto is still publishing his work. A Baseball Think Factory poster going by Blackhawk converted PMR into a run metric, to help line up PMR with the other available numbers, on his blog. One downside to PMR that has been discussed is the addition of line drives to the model. PMR includes line drives in its evaluation, while UZR does not. Most people, including me, prefer a model without line drives, as turning line drives into outs appears at this point to be a non-repeatable skill.
Not everyone has access to play-by-play data, however, so several people have attempted to find a proxy for UZR or PMR with freely available information. Most of those efforts focus on adjusting the Zone Rating number that is available in every player’s ESPN profile. Chris Dial’s work on ZR is very good, and he’s probably the leading proponent of the value of ZR as an analytical tool. He’s also published an article explaining ZR that is essential reading if you’re interested in defensive statistical analysis. He also posted an interesting article on defense that led to some great discussion, and again, is basically required reading if you’re as fascinated by this stuff as I am. Also, Dial provides a worksheet with all the data for 2005, which is quite useful.
Another Baseball Think Factory poster, who goes by Chone Smith, posted his article on Tweaking Zone Rating, which is a similar effort to Dial’s work. Again, there is some more good discussion in the linked thread that is worth reading. And, like Dial, Smith provides a worksheet that shows all his data for 2005. Hooray for open source.
Using slightly different methodology than the others, David Gassko chipped in with his RANGE system, explained here, with a spreadsheet that contains data for 2004. He also penned an article on the Hardball Times site awarding his Gold Gloves, based on the numbers given by RANGE. The RANGE numbers were also featured in the 2006 Hardball Times Baseball Annual, and David has since tweaked his system a bit. People who purchase the THT Annual also get access to a spreadsheet with the RANGE data for 2005.
In the more subjective category, Tangotiger has published his Fans Scouting Report for several years now, asking people to fill out a survey of defensive evaluations for players they’ve watched on a regular basis. While it’s not numerically based, like the other systems we’re discussing, a compilation of subjective opinions can offer some interesting insight, and I’d be remissed if I didn’t mention it.
Lastly, the guys over at Baseball Prospectus also publish defensive numbers based on a system developed by Clay Davenport. The data is available on every player’s Davenport Translation player card on BP’s website, such as this one for Ichiro. BP’s system is the least transparent, however, as the nuts and bolts of how it works haven’t really been explained publically very well, and significant changes have reportedly been made over the past few years, but no one really knows what those changes are. Of all the systems mentioned here, I give BP’s numbers the least credibility.
Those seven systems all attempt to evaluate defense on an individual level, which can be quite a challenge. Addressing it on a team level is significantly easier, and Dave Studemund did a great job of presenting team wide adjusted defensive efficiency in the aforementioned Hardball Times 2006 Annual. Studes article is a great sanity check for all the individual defense numbers.
In addition to these, John Dewan is publishing The Fielding Bible in February, which uses similar batted ball data that led to Studes article on DER and David Gassko’s RANGE system. It should be worth checking out, and could be a valuable addition to the field.
Phew. That’s enough linking for now. That should give you a good overview of the different systems that I think add something significant to conversation on defense. However, that’s an awful lot of information, and as mentioned previously, it rarely all agrees. So, once we gather all this information, how do we turn it into a conclusion?
Let’s use a couple of Marinercentric examples. The first will be Ichiro, widely accepted as the best defensive right fielder in the game. Let’s take a look at how the defensive metrics grade him out.
UZR: +7 runs per season, 2001-2003
PMR: +3.5 runs, 2004. 2005 PMR data for RFs will be available shortly.
Dial’s ZR: -2, 2005
Smith’s ZR: +3, 2005
RANGE: +5, 2004, +18, 2005.
BP: +6 runs per season 2001-2005, +11 2005.
Fan Scouting: 92/100, Best Defensive Player in Game At Any Position, third year in a row
Studes DER: Mariners outfield defense +20 as a whole for 2005.
And you thought Ichiro would grade out well by across the board, didn’t you? Dial’s ZR system is the only one that has him below average, and that’s only for one year of data, while everyone else has him at differing degrees of goodness. The Zone Rating based systems have him as being just solid, while RANGE has him as pretty darn excellent, BP’s metrics have him being quite good, and the fans scouting report thinks he’s the best defensive player alive. On a macro level, we know for a fact that the Mariners outfield defense has been well above average since Ichiro arrived in the states.
So, what would you conclude from that sphere of information? I’d say that its extremely unlikely that Ichiro is really a below average defensive player and fooling almost every system and every person who watches him play. Essentially, we “know” that he’s a good defender. We just don’t have a great grasp of how good. Knowing that RANGE has some issues with rightfield, and its run conversion numbers are a bit inflated in my opinion, I’d likely settle in with an opinion that Ichiro’s glove is worth something like 5 to 15 runs above an average right fielder in any given year. Good? Yes. Best defensive player on the planet? No. In my opinion, the spectacular plays he makes, combined with the friendliness of Safeco Field for outfielders, makes him appear to be slightly better than he really is. But there’s almost no question that he is, in fact, a valuable defensive asset.
How about another example, and one that highlights one of the main flaws of defensive statistics currently available? Yuniesky Betancourt, who, I think, we all agree can play a little defense. All numbers just for 2005, obviously.
UZR: +11.5 (thanks Tango)
Dial’s ZR: -13.5 (initial post had this number incorrect)
Smith’s ZR: -6
RANGE: -3 (thanks David)
Fans Scouting: 86/100, top rated SS
Studes DER: Mariners were about 15 runs below average as an infield.
Talk about divergent opinions. You’ve got anywhere from 11 runs below average over the course of a full season to 22 runs above average. That’s just a massive swing, and obviously, both can’t be correct. Why would the numbers turn out so differently?
Sample Size. The generally accepted principle in defensive statistics is that you need at least two years of data to generate any kind of real conclusion about a player’s abilities, and you’d prefer to have more. With Betancourt, we basically have 1/3 of one season. There are just way too many non-fielding factors that could influence the number over that period of time. Ball in play distribution is a huge factor in small sample defensive numbers, for instance. If Betancourt happened to receive more easy to field grounders than others, his number would be through the roof. If teams were whacking uncatchable balls into the hole, his rating would suffer, and because of the small time frame, the impact of a few extra balls here and there would be magnified greatly.
When it comes to defensive evaluations, you simply cannot ignore the issue of sample size. Limited data samples can be more misleading than informational. If you don’t have a big enough sample, ignore the data. Seriously, I don’t value PMR loving Betancourt anymore than I discount BP’s system hating him. I think they’re both near worthless, because they are drawing from too small a pool to be taken seriously.
When it comes to defense, historical context is huge. With players like Betancourt, we don’t have that, so we need to use the best available information we have, and in cases like his, that’s scouting reports. The M’s organization loves his defense. We loved his defense. Those who filled out the fans scouting report loved his defense. There’s no way that 50 games of data should overrule that information in your own mind. Scouting matters, especially when the data is flawed.
So, when discussing defensive evaluations, I say use as much good information as possible. Look at all the systems in context. Get as many years of data as possible. Look at the scouting reports. And then, draw conclusions that accurately represent your confidence level. If the system’s aren’t accurate enough yet to give us one number (they aren’t), use a range. It’s okay to say that Ichiro is about 5-15 runs above average. That’s the extent to what we know at this point. No need to be more conclusive than we’re able to.
The defensive systems will get better. This is what we have right now. They’re useful, but moreso when used together, rather than viewed as seperate entities.