Projecting Future Performance
Last week, Geoff Baker wrote a series of blog posts that dealt with the issue that has been dominating the blogosphere conversation for most of the past three months – the playing time of Adam Jones, Raul Ibanez, and Jose Vidro, and how it should be distributed. Don’t worry – this post is not about that topic. At least, not explicitly. This post is about a commonly accepted principle that was laid out very well by Baker in that trio of entries. The idea is summed up in this statement:
It’s going to be hard to keep Raul Ibanez out of the lineup now that he’s hit six home runs in nine games. Equally tough to sideline Jose Vidro now that he’s back to being a hits machine. I was all for playing Adam Jones every day when those other guys were struggling back in July. But things have changed. The veterans have stepped up and earned their playing time of late.
In July, Geoff was on board with the belief that Adam Jones would be able to help the Mariners as an everyday player, and the struggling veterans should be ceding playing time to the more talented youngster. He felt the struggles of guys like Vidro and Ibanez warrented a change, and Jones provided a superior option. He doesn’t feel that way anymore. Why? Because Raul Ibanez and Jose Vidro are hitting well recently, and Baker believes in the predictive power of the hot hand.
This isn’t a unique position. Almost everyone believes in the predictive power of the hot hand. The overwhelming majority of people in America base their future expectations – not just in sports, but in life – on their most recent experience. In sports, this is even more prevalent, as we’ve all witnessed players perform at a level far beyond what we expected them to do. Joe Dimaggio’s 56 game hit streak may be one of the most celebrated records in sports. Seattle saw Ken Griffey Jr hit home runs in eight consecutive games. Or, to bring it back to the current reason for this discussion, Raul Ibanez has seven home runs in his last 48 at-bats after hitting six bombs in his first 372 at-bats. He’s on fire. He’s swinging the bat well. Each pitch looks like a beachball. Pick your cliche`.
We all know a hot streak when we see one, even if we don’t know why they occur. There’s a debate about whether hot streaks are random fluctuation of events or an actual change in skills for a temporary period of time. I don’t even begin to know the answer to that question, and I can see the validity of both arguments. But that’s not what this post is about.
No, this post is about the predictive power of the hot streak and how that should affect our expecations. As Geoff laid out in the three linked blog entries above, the common wisdom is recent success should be a huge factor in determining playing time. Raul Ibanez is on fire (over 48 at-bats) and Adam Jones hasn’t earned his playing time (over 23 at-bats), and those performances were enough to change Geoff’s mind about who should be taking the field for the rest of the year. Getting away from that specific discussion, the issue I want to address is how much credence we should give recent performance in developing our expectations for how a player should perform going forward, even in the very near future.
And, you know me, I’m not a big fan of developing opinions on anecdotal evidence. I know there are random examples that we can cite to support any cause we want, but I don’t particularly care about that kind of analysis. I want to know what a large swath of history tells us about the predicitve power of recent performance. Do hot hitters actually perform better, even for short periods of time, once we’ve identified that they’re hot hitters?
Keep in mind – this is a statistical argument. This isn’t one of these cases where all the people who think I’m an idiot who needs to care less about the numbers can tell me to get my head out of a spreadsheet and go watch a game, because the hot streak supporters are making an argument based on numbers. All I’m doing is testing the hypothesis of whether the numbers they’re choosing to believe in actually have any meaning.
Okay, so now that the overly long introduction is out of the way, let’s look at the evidence. The best research done on this issue that I’ve ever read comes from The Book: Playing the Percentages in Baseball, written by Tom Tango, Mitchel Lichtman, and Andy Dolphin. For people who care at all about baseball statistics, The Book is a must read. These guys are among the very best researchers on baseball issues alive, and The Book is a comprehensive review of almost any question relating to statistics you’d want to see asked. While it’s not the easiest reading you’ll ever have, it still comes highly recommended.
In the second chapter of The Book, the guys tackled the very question this post deals with – do hot streaks present any kind of real information that is useful in understanding how a hitter is likely to do going forward? To test this, they pulled in every play from the 2000 to 2003 seasons and identified hot and cold streaks as the upper and lower 5% of all performances over any five game sample that included at least 20 plate appearances. The best 5% of performances went into a hot bucket and the worst 5% went into a cold bucket. That gave them 543 unique players creating a total of 6,408 “hot streaks”, and 633 players creating a total of 6,489 cold streaks. With nearly 13,000 streaks in the sample, they eliminated nearly any bias complaint you could happen to have with the study, and created a sample large enough to give us a conclusive answer – do the players who have been identified as “hot hitters” perform better than expected based on their historical averages, and vice versa, do the slumping hitters perform worse than expected in their next few games?
Without getting too deep into the statistical minutae (for that, you should buy The Book), here are the numbers (from page 56, for those of you who already own it) – for offensive performance, they use a metric called Weighted On Base Average, of wOBA for short, which essentially sums up total offensive performance and scales it to look like on base percentage. Think of it like OPS, just better, and on a different scale. .340 is average, .400 is great, .300 is bad. Just like OBP – but as a total sum of offensive production.
Average wOBA of hot hitters during streak: .587
Expected wOBA of hot hitters in 1 game after the streak: .365
Average wOBA of hot hitters in 1 game after the streak: .369
Expected wOBA of hit hitters in 5 games after the streak: .365
Average wOBA of hot hitters in 5 games after the streak: .369
As you can see, the production of the hitters in their sixth game after being identified as being hot (and hot doesn’t even begin to describe a .587 wOBA – that’s scorching), the players performed .004 better than expected if we had just used a three year average of their past performance and had no knowledge of what they’d done in their previous five games. Statistically significant? Yes, but by the thinnest of margins.
Since I’m wary of overstepping fair use and giving away too much copyrighted material, rather than spelling out the actual numbers of the cold hitters, I’ll tell you that the result in basically the same on the opposite end – the players performed worse than expected by an ever so tiny margin immediately after a five game super slump. They also re-ran the data over a seven game sample and looked at the performance in the following three games after being identified as hot or cold and found the numbers consistent with the five game samples.
But, I know, there will be some protests about how not all hot streaks are the same, and averaging 543 players together will be unfair to those who were really, truly hot. Thankfully, the guys included a list of the 10 hottest hitters over a seven game stretch. Marcus Giles had the most success run, going 18 for 25 with 7 extra base hits from July 25th through July 29th of 2003, good for a .720/.731/1.160 line. 18 for 25! His next 5 games? 0 for 4, 2 for 4, 0 for 4, 2 for 3, and 0 for 4, a grand total of 4 for 19 and a .211/.348/.368 line.
Giles was not alone. Of the ten hottest hitters from 2003, nine of them then proceeded to hit worse than expected (again, based on historical averages and ignoring the recent hot streak) in their next three games, with only Magglio Ordonez bucking the trend and continuing to hit well. From July 20th through July 24th, Ordonez went 13 for 19 with seven extra base hits, then went 12 for 20 with five more extra base hits in his next five games. That gave him a 25 for 39 stretch where he ran an 1.850 OPS over 46 plate appearances and is one of the best runs in recent baseball history. From July 31st through August 3rd, Ordonez followed this 10 game hot streak with an 0 for 14 series of hitless games, and in the 47 plate appearances (spanning 11 games) after we could identify him as one of the hottest hitters in recent memory, Ordonez hit .244/.340/.366.
The first sentence of the conclusion of the chapter, quoted from The Book:
Knowing that a hitter has been in or is in the midst of a hot or cold streak has little predictive value.
Historical evidence suggests that knowing that a player is on fire should do essentially nothing for our expecations of what he’ll do going forward, even in the very near future. In fact, given the choice of being totally ignorant of recent performance or knowing exactly how each player performed in a small sample, you would, in almost every case, be better off being totally ignorant. The natural tendancy to overstate the value of the predictive power of the hot streak (or cold streak) outweighs the sliver of actual useful information that is included in hot streak analysis. Because of our own biases, we’d make more correct decisions if we had less data.
Of course, the ideal isn’t to have less data, but to understand our biases and compensate accordingly, allowing us to live in a data-filled world and still make optimal decisions as often as possible. That’s part of what we’re trying to do here, and what statistical analysis does a good job of explaining – identify where human error leads us to drawing conclusions that are unsupported by the realities of life.
Going back to the Mariner-centric discussion that started this all, we have the Raul Ibanez/Adam Jones situation. If you, like Geoff Baker did, believed at the end of July that Adam Jones was a better player than Raul Ibanez and should be taking the field everyday, then nothing that has happened on the field since then should change your opinion. Raul Ibanez isn’t any more likely to hit well tonight than he was three weeks ago. His expected performance should be, for all intents and purposes, exactly the same. Whatever you thought of him on July 31st, you should also think of him now.
History paints a clear picture. Again, quoting from The Book (page 45):
One of the running themes of this book is that, very frequently, fans and analysts make too much from too little.
This is an important bias to keep in mind when performing any kind of analytical exercise. Our natural emotional reactions lead us to overvalue what has happened recently, and too often, we draw incorrect conclusions about what is going to happen based on things that have little or no real predictive value.
I actually have a lot more to write on the subject of correct player evaluations and projections (including talking about longer hot streaks, such as Jose Vidro’s, and how to evaluate a real change in performance), but for time and space reasons, I’m going to have to make that a post for another day.
Before I go, I’m going to make a request – please don’t turn the comments into another chance to rehash the same old argument we’ve been having for the last three months in the comment threads. If you feel that Ibanez should be starting due to clubhouse chemistry, veteran experience, or if you never felt that Jones was better than Ibanez, that’s fine – that’s also not what this post is about. The topic is about the predictive power of hot and cold streaks. I’ll be a much happier author if that’s what we talk about in the comments.