Projecting Future Performance

Dave · August 20, 2007 at 8:38 am · Filed Under Mariners 

Last week, Geoff Baker wrote a series of blog posts that dealt with the issue that has been dominating the blogosphere conversation for most of the past three months – the playing time of Adam Jones, Raul Ibanez, and Jose Vidro, and how it should be distributed. Don’t worry – this post is not about that topic. At least, not explicitly. This post is about a commonly accepted principle that was laid out very well by Baker in that trio of entries. The idea is summed up in this statement:

It’s going to be hard to keep Raul Ibanez out of the lineup now that he’s hit six home runs in nine games. Equally tough to sideline Jose Vidro now that he’s back to being a hits machine. I was all for playing Adam Jones every day when those other guys were struggling back in July. But things have changed. The veterans have stepped up and earned their playing time of late.

In July, Geoff was on board with the belief that Adam Jones would be able to help the Mariners as an everyday player, and the struggling veterans should be ceding playing time to the more talented youngster. He felt the struggles of guys like Vidro and Ibanez warrented a change, and Jones provided a superior option. He doesn’t feel that way anymore. Why? Because Raul Ibanez and Jose Vidro are hitting well recently, and Baker believes in the predictive power of the hot hand.

This isn’t a unique position. Almost everyone believes in the predictive power of the hot hand. The overwhelming majority of people in America base their future expectations – not just in sports, but in life – on their most recent experience. In sports, this is even more prevalent, as we’ve all witnessed players perform at a level far beyond what we expected them to do. Joe Dimaggio’s 56 game hit streak may be one of the most celebrated records in sports. Seattle saw Ken Griffey Jr hit home runs in eight consecutive games. Or, to bring it back to the current reason for this discussion, Raul Ibanez has seven home runs in his last 48 at-bats after hitting six bombs in his first 372 at-bats. He’s on fire. He’s swinging the bat well. Each pitch looks like a beachball. Pick your cliche`.

We all know a hot streak when we see one, even if we don’t know why they occur. There’s a debate about whether hot streaks are random fluctuation of events or an actual change in skills for a temporary period of time. I don’t even begin to know the answer to that question, and I can see the validity of both arguments. But that’s not what this post is about.

No, this post is about the predictive power of the hot streak and how that should affect our expecations. As Geoff laid out in the three linked blog entries above, the common wisdom is recent success should be a huge factor in determining playing time. Raul Ibanez is on fire (over 48 at-bats) and Adam Jones hasn’t earned his playing time (over 23 at-bats), and those performances were enough to change Geoff’s mind about who should be taking the field for the rest of the year. Getting away from that specific discussion, the issue I want to address is how much credence we should give recent performance in developing our expectations for how a player should perform going forward, even in the very near future.

And, you know me, I’m not a big fan of developing opinions on anecdotal evidence. I know there are random examples that we can cite to support any cause we want, but I don’t particularly care about that kind of analysis. I want to know what a large swath of history tells us about the predicitve power of recent performance. Do hot hitters actually perform better, even for short periods of time, once we’ve identified that they’re hot hitters?

Keep in mind – this is a statistical argument. This isn’t one of these cases where all the people who think I’m an idiot who needs to care less about the numbers can tell me to get my head out of a spreadsheet and go watch a game, because the hot streak supporters are making an argument based on numbers. All I’m doing is testing the hypothesis of whether the numbers they’re choosing to believe in actually have any meaning.

Okay, so now that the overly long introduction is out of the way, let’s look at the evidence. The best research done on this issue that I’ve ever read comes from The Book: Playing the Percentages in Baseball, written by Tom Tango, Mitchel Lichtman, and Andy Dolphin. For people who care at all about baseball statistics, The Book is a must read. These guys are among the very best researchers on baseball issues alive, and The Book is a comprehensive review of almost any question relating to statistics you’d want to see asked. While it’s not the easiest reading you’ll ever have, it still comes highly recommended.

In the second chapter of The Book, the guys tackled the very question this post deals with – do hot streaks present any kind of real information that is useful in understanding how a hitter is likely to do going forward? To test this, they pulled in every play from the 2000 to 2003 seasons and identified hot and cold streaks as the upper and lower 5% of all performances over any five game sample that included at least 20 plate appearances. The best 5% of performances went into a hot bucket and the worst 5% went into a cold bucket. That gave them 543 unique players creating a total of 6,408 “hot streaks”, and 633 players creating a total of 6,489 cold streaks. With nearly 13,000 streaks in the sample, they eliminated nearly any bias complaint you could happen to have with the study, and created a sample large enough to give us a conclusive answer – do the players who have been identified as “hot hitters” perform better than expected based on their historical averages, and vice versa, do the slumping hitters perform worse than expected in their next few games?

Without getting too deep into the statistical minutae (for that, you should buy The Book), here are the numbers (from page 56, for those of you who already own it) – for offensive performance, they use a metric called Weighted On Base Average, of wOBA for short, which essentially sums up total offensive performance and scales it to look like on base percentage. Think of it like OPS, just better, and on a different scale. .340 is average, .400 is great, .300 is bad. Just like OBP – but as a total sum of offensive production.

Average wOBA of hot hitters during streak: .587
Expected wOBA of hot hitters in 1 game after the streak: .365
Average wOBA of hot hitters in 1 game after the streak: .369
Expected wOBA of hit hitters in 5 games after the streak: .365
Average wOBA of hot hitters in 5 games after the streak: .369

As you can see, the production of the hitters in their sixth game after being identified as being hot (and hot doesn’t even begin to describe a .587 wOBA – that’s scorching), the players performed .004 better than expected if we had just used a three year average of their past performance and had no knowledge of what they’d done in their previous five games. Statistically significant? Yes, but by the thinnest of margins.

Since I’m wary of overstepping fair use and giving away too much copyrighted material, rather than spelling out the actual numbers of the cold hitters, I’ll tell you that the result in basically the same on the opposite end – the players performed worse than expected by an ever so tiny margin immediately after a five game super slump. They also re-ran the data over a seven game sample and looked at the performance in the following three games after being identified as hot or cold and found the numbers consistent with the five game samples.

But, I know, there will be some protests about how not all hot streaks are the same, and averaging 543 players together will be unfair to those who were really, truly hot. Thankfully, the guys included a list of the 10 hottest hitters over a seven game stretch. Marcus Giles had the most success run, going 18 for 25 with 7 extra base hits from July 25th through July 29th of 2003, good for a .720/.731/1.160 line. 18 for 25! His next 5 games? 0 for 4, 2 for 4, 0 for 4, 2 for 3, and 0 for 4, a grand total of 4 for 19 and a .211/.348/.368 line.

Giles was not alone. Of the ten hottest hitters from 2003, nine of them then proceeded to hit worse than expected (again, based on historical averages and ignoring the recent hot streak) in their next three games, with only Magglio Ordonez bucking the trend and continuing to hit well. From July 20th through July 24th, Ordonez went 13 for 19 with seven extra base hits, then went 12 for 20 with five more extra base hits in his next five games. That gave him a 25 for 39 stretch where he ran an 1.850 OPS over 46 plate appearances and is one of the best runs in recent baseball history. From July 31st through August 3rd, Ordonez followed this 10 game hot streak with an 0 for 14 series of hitless games, and in the 47 plate appearances (spanning 11 games) after we could identify him as one of the hottest hitters in recent memory, Ordonez hit .244/.340/.366.

The first sentence of the conclusion of the chapter, quoted from The Book:

Knowing that a hitter has been in or is in the midst of a hot or cold streak has little predictive value.

Historical evidence suggests that knowing that a player is on fire should do essentially nothing for our expecations of what he’ll do going forward, even in the very near future. In fact, given the choice of being totally ignorant of recent performance or knowing exactly how each player performed in a small sample, you would, in almost every case, be better off being totally ignorant. The natural tendancy to overstate the value of the predictive power of the hot streak (or cold streak) outweighs the sliver of actual useful information that is included in hot streak analysis. Because of our own biases, we’d make more correct decisions if we had less data.

Of course, the ideal isn’t to have less data, but to understand our biases and compensate accordingly, allowing us to live in a data-filled world and still make optimal decisions as often as possible. That’s part of what we’re trying to do here, and what statistical analysis does a good job of explaining – identify where human error leads us to drawing conclusions that are unsupported by the realities of life.

Going back to the Mariner-centric discussion that started this all, we have the Raul Ibanez/Adam Jones situation. If you, like Geoff Baker did, believed at the end of July that Adam Jones was a better player than Raul Ibanez and should be taking the field everyday, then nothing that has happened on the field since then should change your opinion. Raul Ibanez isn’t any more likely to hit well tonight than he was three weeks ago. His expected performance should be, for all intents and purposes, exactly the same. Whatever you thought of him on July 31st, you should also think of him now.

History paints a clear picture. Again, quoting from The Book (page 45):

One of the running themes of this book is that, very frequently, fans and analysts make too much from too little.

This is an important bias to keep in mind when performing any kind of analytical exercise. Our natural emotional reactions lead us to overvalue what has happened recently, and too often, we draw incorrect conclusions about what is going to happen based on things that have little or no real predictive value.

I actually have a lot more to write on the subject of correct player evaluations and projections (including talking about longer hot streaks, such as Jose Vidro’s, and how to evaluate a real change in performance), but for time and space reasons, I’m going to have to make that a post for another day.

Before I go, I’m going to make a request – please don’t turn the comments into another chance to rehash the same old argument we’ve been having for the last three months in the comment threads. If you feel that Ibanez should be starting due to clubhouse chemistry, veteran experience, or if you never felt that Jones was better than Ibanez, that’s fine – that’s also not what this post is about. The topic is about the predictive power of hot and cold streaks. I’ll be a much happier author if that’s what we talk about in the comments.

Comments

277 Responses to “Projecting Future Performance”

  1. JeffnBham on August 20th, 2007 8:54 am

    Thanks Dave.

    Based upon your recent hot streak of well-researched columns I predict your next one will be a grand-slam as well.

  2. Username on August 20th, 2007 8:58 am

    Thanks Dave for another great post; your content is some of the best and original I come across on the web (both sports-related or not-sports related).

    I look forward to more information of when a possible real change in performance has/is occurring.

  3. Jeff Nye on August 20th, 2007 8:58 am

    Awesome post, Dave.

    It always amazes me that people give so much credence to “hot” and “cold” streaks; at the base, it all boils down to understanding the importance of sample size, and that seems to me to be a fairly easy concept to “get”.

    I’m looking forward to the post about how to tell the difference between a “hot streak” and a true change in expected future performance; I think a lot of the confusion comes from people not being able to differentiate between the two.

    My own personal thought is that you need a pretty substantial sample size before you can start saying that someone has “turned it around”, on the order of a half season if not more. Two weeks doesn’t tell you anything.

  4. dahut on August 20th, 2007 9:00 am

    One question that comes to mind is how did the authors define the extent of a hot/cold streak. A simplistic rebuttal is to say that by defining the end of a hot streak of course the next series of at bats will have regress to the mean. It most likely that the authors looked at the question in more detail that you didn’t go into.

    But I think the problem is that conventional wisdom is so strong in this case. It would take a supremely confident manager to buck this prevailing pyschology.

  5. mmccall on August 20th, 2007 9:04 am

    Dave,

    What if Raul’s current performance can be attributed to recovery from injury or the correction of bad habits developed as a result of compensation for injury? Do you think a distinction can be made between what is seemingly a random spike in performance and increased performance having an underlying cause?

  6. AomoriMariner on August 20th, 2007 9:04 am

    Thanks for a great post, as always.

    Don’t both fans and managers seem to already understand this at some level when it comes to cold streaks? If an All Star player goes into a slump during the season, we acknowledge it as a slump and don’t expect that the player will continue to struggle for the rest of the year.

    Something, then, about success makes us less willing to accept a streak for what it is. Of course, during a hot streak Player A might indeed be able to carry a team for a few games and his bat needs to be in the lineup. Once the streak is over, how does a manager make the fans forget recent events and play the numbers? Or is that just part of why the manager collects his substantial paycheck to put the best possible team on the field each day?

  7. zzyzx on August 20th, 2007 9:09 am

    “Whatever you thought of him on July 31st, you should also think of him now.”

    I don’t quite buy that. I couldn’t find stats for July 31, but I did a rough calculation (multiplying his monthly pre-August splits by at bats, adding them, and dividing that by all pre-August at bats) to get his OPS on 7/31 – .694. His current OPS for 2007 is .787.

    So IMO we should judge Raul differently on 8/20 than on 7/31 because we have more information about the 2007 season. Sure, there’s the risk of overreacting to his 1.385 August OPS, but how much of the call for Raul to leave was an equal overreaction to his .503 July OPS?

  8. marc w on August 20th, 2007 9:14 am

    A very nice post, and anything referencing The Book is cool with me.
    But I’m not sure that this “This isn’t one of these cases where all the people who think I’m an idiot who needs to care less about the numbers can tell me to get my head out of a spreadsheet and go watch a game, because the hot streak supporters are making an argument based on numbers” is true.

    Some people may make a numbers-based, ‘extend the trend line’ argument based on the numbers. But others, I think, are making a management argument; that players would balk at seeing guys who’ve ‘carried the team’ benched in favor of a rookie. We could trot out all the stats we wanted, but many players might see that move as capricious. So – we’ve now left the realm of stats behind, and there’s not a whole lot of meaningful argument to be had about something like that. But I think it’s a big part of why someone like Baker is saying what he’s saying. Could be wrong…

    Second, something about this: “Whatever you thought of him on July 31st, you should also think of him now” seems odd to me. Is this… results-based analysis?

  9. Mike Honcho on August 20th, 2007 9:15 am

    zzyzx – You are missing the point. Each of the hot streaks identified by Tango, et. al., resulted in a higher OPS for each particular player. What their research showed is that it should not be expected that a player’s production will be higher from the hot streak on out than it was before the streak.

  10. Ben Ramm on August 20th, 2007 9:16 am

    Since I became aware of these types of studies over a decade ago, the interesting topic has not been reconfirming them. Instead, I find it fascinating why people resist this data so vigorously. Do you really think that Geoff Baker would actually care about this kind of evidence? He must have encountered it at some point. The initial Tversky study of the “hot hand” in basketball is almost 25 years old. Baker is far more interesting than Finnigan, but he still considers certain pieces of evidence relevant without regard to whether any evidence suggests that the evidence is relevant. That is, going 7 for 10 is evidence (of something), but little evidence supports that it is relevant to predicting what will happen on the 11th at bat. So, if you’re going to respond to Baker, or even if you’re going to make a general point, what about explaining why people refuse to accept information like the information you’ve presented here?

  11. scraps on August 20th, 2007 9:17 am

    There are also a bunch of people who claim their argument isn’t just based on the recent numbers, and say that you can see how much better the guy on the hit streak is swinging the bat, aren’t you watching the actual games, etc.

  12. Manzanillos Cup on August 20th, 2007 9:19 am

    I think a lot of us here have experienced being “in the zone” on a baseball field in high school or college. It seems like a very real experience – so I can see why most people have a tough time dismissing it as random – and I would have had a big problem being benched when I thought I was in the middle of one. However, the key is that even if the “zone” is real, it doesn’t appear to be predictable or permanent.

    As a player, I remember that real upgrades in performance seemed to present themselves much differently than a hot streak. I remember lots of times when I would get worse (as I tried some new technique) before I got better…

  13. Dave on August 20th, 2007 9:20 am

    My own personal thought is that you need a pretty substantial sample size before you can start saying that someone has “turned it around”, on the order of a half season if not more. Two weeks doesn’t tell you anything.

    One of the things that I’ve been gravitating towards for a few years, and am now firmly in the boat of, is evaluating changes in skills rather than results. I believe that any sustainable deviation in results will be the biproduct of a change in skills, and through a better understanding of what statistics to look at as well as quality scouting information, I think we can identify skills changes that will allow us to see what players have actually improved, rather than which players are just riding a nice wave.

    What if Raul’s current performance can be attributed to recovery from injury or the correction of bad habits developed as a result of compensation for injury? Do you think a distinction can be made between what is seemingly a random spike in performance and increased performance having an underlying cause?

    Sure – the injury factor is one of the main legitimate causes for deviation in performance. However, I think we need to be careful in just randomly assigning the injury excuse to any player who suddenly sees a change in performance. What changed with Raul Ibanez on August 3rd that wasn’t true on August 2nd, for instance? Or how is his claim that he’s finally healthy this time any different from the one he made after hitting two home runs in Cleveland back in June?

    We’ve consistencly cited Adrian Beltre’s near-death experience after having a botched surgery to have his appendix removed in the winter of 2001 as a factor in his poor performance in 2002 and 2003. I think, in that case, there’s a legitimate medical reason to point to, and a definitive date of when the change in his status took place.

    With Ibanez, or players like him, all we have is a randomly selected endpoint of when his performance changed. We’re retroactively deciding when he “got healthy” based on when he started hitting well. That’s backwards, and it’s not something we should be in the business of doing.

    So IMO we should judge Raul differently on 8/20 than on 7/31 because we have more information about the 2007 season. Sure, there’s the risk of overreacting to his 1.385 August OPS, but how much of the call for Raul to leave was an equal overreaction to his .503 July OPS?

    We’ll cover this more in the projection post, but basically, if you’re allowing 48 at-bats to significantly alter your projection, you’re overvaluing current year data in lieu of prior year data.

  14. HamNasty on August 20th, 2007 9:23 am

    As usual, great post Dave. I find myself in the middle of both arguments swaying back and forth. I look back to my days playing sports and how it related.

    I am looking forward to that post about when a hot streak turns into a performance change and evaluating it very much!

  15. Dave on August 20th, 2007 9:23 am

    But others, I think, are making a management argument; that players would balk at seeing guys who’ve ‘carried the team’ benched in favor of a rookie.

    I should probably go back and put an addendum in the post, because this is something I meant to cover before I ran out of time – this post really has nothing to do with whether Ibanez or Jones should be playing. That will be covered in the projection post. I understand the inherent problems with benching the hot hand and how that could be a political nightmare, and I think that should, to some extent, be a factor in the decision making process.

    But, for now, all I’m taking issue with is the idea that the recent surge in performance should drastically alter our expectations of what we’re going to see going forward. I’m not dealing with the playing time issue specifically here.

    Second, something about this: “Whatever you thought of him on July 31st, you should also think of him now” seems odd to me. Is this… results-based analysis?

    Well, the assumption in the statement was that you shouldn’t be evaluating Raul only based on his 2007 data through July, either. Any projection worth its salt will factor in multiple years of data.

  16. Uncle Ted on August 20th, 2007 9:26 am

    What would you say if the argument for Raul went something more like this. “It’s not that we believe in the hot streak, rather, we think that his recent explosion is evidence that his apparent demise was merely apparent and now we believe that he will be the .280/.350/.470 (pecota) hitter we thought he’d be. That combined with defense may or may not be better than Jones, but we think Raul’s projection is a safer bet than Adam’s. Moreover, half a season is too small of a sample to count the guy out.

  17. Seth on August 20th, 2007 9:26 am

    Isn’t this sort of like the stock market? I mean, you understand that stocks (like batting averages) fluctuate over time, so you leave your money in the whole time so you don’t miss the hot streaks.

    You can’t predict from day-to-day whether a stock will go up or down.

    Seems to me that (for better or worse) the M’s are treating Raul Ibanez and Richie Sexson like blue-chip stocks, that they expect will–over time–perform well. And in the case of Ibanez, at least, it’s paid off.

  18. Dave on August 20th, 2007 9:26 am

    However, the key is that even if the “zone” is real, it doesn’t appear to be predictable or permanent.

    That’s a much better summary statement than the one I put in the post.

  19. Dan W on August 20th, 2007 9:27 am

    So, perhaps a more reasonable projection for Raul is something akin to his most recent 3 year period, during which his OPS was approximately 816 (sorry I do not know how to quickly find a more precise 3 year split, but this is in the ballpark) as opposed to his ridiculous numbers over the last couple of weeks.

  20. zzyzx on August 20th, 2007 9:27 am

    7 – You’re kind of missing my point though in that it’s unfair to throw out the hot streaks but not do the same argument for the cold ones. On July 31, I think we were judging Raul more harshly than we “should” have because his July was so bad.

    11 – “We’ll cover this more in the projection post, but basically, if you’re allowing 48 at-bats to significantly alter your projection, you’re overvaluing current year data in lieu of prior year data.”

    Looking at prior year data would make the .787 OPS more realistic than the .694, as that’s a lot closer to his past few years. IMO a lot of the Adam Jones buzz came out of fear that Ibanez had fallen off of the aging cliff, and that was an overreaction to July.

    I’d rather have Jones out there because he’s probably at least Ibanez’s equal as a hitter and a far superior defender, but that is a different argument than when it looked like Raul had also lost it as a hitter.

  21. 51isMYsavior on August 20th, 2007 9:30 am

    I think many are missing the point. Correct me if I’m wrong, but I think the entire goal of this post is to illustrate the idea that just because someone is 7-10 doesn’t mean they are more or less likely to hit or get out on the 11th at bat (thanks #8).

    Indeed, the law of averages does NOT change because someone has hit safely in 7 of his last 10 at bats.

    If you’re flipping a coin, you have a 50% chance to flip heads and tails. No matter if you’ve flipped 10 heads in a row, the 50% still stands.

    Ibanez is hitting .287 lifetime. That’s 1133 hits in 4003 at bats. Analyses MUST use this sample size in determining the likelihood of a hit, not the last 48 at bats.

  22. Spanky on August 20th, 2007 9:30 am

    Wow!! You have rocked the foundations of my baseball soul! I always knew that a “hot streak” ends at some point and that people have a tendency to over-project the hot steak beyond the time that statistical data shows the hot streak is over. But this is amazing.

    Question: Would you conclude from your research that 5 games is a typical length of a hot streak? What would be considered the mean of a hot streak?

  23. Dave on August 20th, 2007 9:32 am

    Ibanez is hitting .287 lifetime. That’s 1133 hits in 4003 at bats. Analyses MUST use this sample size in determining the likelihood of a hit, not the last 48 at bats.

    Actually, most data beyond 3+ years has been shown to have no real value either. It doesn’t really matter what Raul Ibanez did in 1998. Adding in career numbers is often just as misleading as adding in recent small samples.

    I really need to finish the projection post, eh? A lot of these subjects will be covered in that.

  24. HamNasty on August 20th, 2007 9:33 am

    Dave when you said “Because of our own biases, we’d make more correct decisions if we had less data.”

    IS that falling into the same theory of betting on the NCAA tournament with the secretary who never watched a game in her life wins the pot?

    In coaching it is somewhat true, if you have enough of a idea about your personnel but left the biases at the door then I think you are right on.

  25. Dave on August 20th, 2007 9:35 am

    Seems to me that (for better or worse) the M’s are treating Raul Ibanez and Richie Sexson like blue-chip stocks, that they expect will–over time–perform well. And in the case of Ibanez, at least, it’s paid off.

    Right – the M’s are strong believers in track records, and fervently believe that players will perform just as they did the year before until proven otherwise. The problem, however, is that because they don’t understand player aging curves, this leaves them in the position of being first hand witnesses to the collapse of guys who are really just done as major league players.

    The assertion that Ibanez was just the next in the long line of guys who fell off a cliff was premature, certainly. We were overvaluing his 2007 performance in calling his career over. Mea culpa.

  26. zzyzx on August 20th, 2007 9:37 am

    Dave – I’m definitely interested in reading that. To try to paraphrase where I’m coming from, part of my reaction to Ibanez is a note in the back of my head saying, “Raul is 35, players at that age can suddenly lose their skills.” A 100 point drop in OPS might just be random variations with small sample sizes or it might be the beginning of the end.

    If you have insight into how to tell the difference between the former and the latter, I’d definitely be interested in reading it. Hopefully, I’d be able to follow it too.

  27. rrose on August 20th, 2007 9:39 am

    8

    I am reminded of the 19th Century philosopher Herbert Spencer, of whom it was said that the definition of tragedy was “a beautiful theory killed by an ugly fact”. Alas, there is no shortage of people who share Spencer’s notion of the tragic, in any realm of knowledge or human interest.

  28. Safeco Hobo on August 20th, 2007 9:39 am

    From seeing the games, I doubt anyone would argue Raul has been hitting the ball better during the recent ‘hot’ streak. However, i would be curious to know statistically (if at all possible) how much other factors that the batter cannot control contribute to their hot or cold hitting streaks.

    Some of Raul’s big hits came in weather conditions that helped the ball carry, the ballparks he was hitting balls out were very forgiving (ie: chicago), and some of the pitchers were giving up meatballs and Raul just capitalized.

    The question is, when do we actually look at the numbers from the past games and realize it might not just be Raul seeing beach balls but other contributing factors helped. So management might realize an upcoming game in San Diego angainst Randy Johnson may be a good time rest Raul, or at least not bat him 4th, just because he is on a 15 for 20 hot streak against the White Sox and Tampa. (I know Randy doesn’t play in San Diego, just thinking of a tough lefty in a tough hitting park)

  29. Mere Tantalisers on August 20th, 2007 9:40 am

    I think its somehwat unfair to use the top and bottom 5% of performances to show that the production level is unsustainable. Over a short stretch like five games the outliers will certainly be farther out, and certainly the level of production would be unsustainable.

    I think a more insightful question to ask might be ‘how likely is another hot streak after the one initially identified?’ or how soon to follow? I don’t disagree with what you’re saying, Dave, not at all. I just wonder if the findings would be the same if the questions asked were slightly different.

    A batter’s line over a season is not built linearly, as we all know. It is a compound of hot and cold streaks (though perhaps more for some than for others) and in this case the argument could be made that Ibanez recent run is a regression to the performance expected of him based on his last three years. In that case it is not so much a hot streak as a pendulum swing…

  30. Uncle Ted on August 20th, 2007 9:42 am

    Dave, can you say something about expected reliability of projections in that post? It seems to me that if you have two hitters one of whom has a slightly higher median projection but has a much wider range you might at this point in the season go with the safer bet. Of course this all depends on whether you need to make up ground and how much, and if you need to make up lots of ground then you’d increase your post season expectations by going with the riskier option. I know nothing about whether Raul or Adam fits either of these descriptions, nor do I understand reliability rankings like Pecota’s, but I’d like to.

  31. Mike Honcho on August 20th, 2007 9:43 am

    Question: Would you conclude from your research that 5 games is a typical length of a hot streak? What would be considered the mean of a hot streak?

    Good question – I was wondering this too. Does the analysis change if the hot streak is a more prolonged “warm streak” – which is probably what we are seeing from Vidro?

  32. rrose on August 20th, 2007 9:43 am

    ooops, #27 was a response to something in #10. Sorry for the error.

  33. JLC on August 20th, 2007 9:43 am

    I agree that “being in the zone” is a real phenomenon, and that going into and coming out of it are not predictable.

    The concept is a little more complex in baseball, where hitting .500 means failing half the time. Given that, I can understand the hitter’s and the viewer’s tendency to assume the hot streak is lasting longer than it actually does.

    The other human tendency is to see patterns, even when there are none, and that screws us up all the time.

  34. Aaron on August 20th, 2007 9:46 am

    Re: recent performance as a predictor of the future.

    Just for the sake of discussion, can’t you turn that same argument around on yourself to some extent, though?

    Say the team (or Baker) believes that Vidro and Ibanez have been valuable major-league players for years. Whatever skills they have displayed are the skills valued by team management (they did acquire both players on purpose, after all).

    And for the first 3 months of the season, they both hit a cold stretch, rather than displaying actual declining skills countered by a recent hot streak. If that’s the case (though I doubt it, and I’m sure you guys do too), then a turnaround isn’t just a hot streak, it’s a return to an actual display of skill.

    So if the “cold stretch” was a reason to reduce their playing time (Baker was aboard that bandwagon, and the callup of Jones indicates that management was as well), then why DOESN’T a hot stretch negate that?

    In other words, if USSM was on board with reducing ABs for the cold guys, isn’t that the flip side of the same argument for maintaining ABs for hot guys?

    Certainly there’s an argument for playing Jones just based on defensive aptitude, so I’m not saying he should sit, but getting on the team’s back for playing the hot hand when you guys were driving the wagon to get the cold guys run out of town (figuratively, of course) just screams to me, “I’m missing something here!”

  35. pygmalion on August 20th, 2007 9:46 am

    I’m glad to see the data on this. I’ve always felt it to be true, on analogy with cold streaks. As a poster pointed out, everyone accepts that cold streaks by an all-star as anomalous. So hot streaks must be the same, one would reason. Glad to have the evidence of it.

    I think that you should move the point that we ought always to have been depending on three year data to the post. It makes a significant difference to how we should understand your statement that we shouldn’t judge Ibanez differently today than on July 31st.

  36. OscarM on August 20th, 2007 9:47 am

    Isn’t part of the the problem that streaks, hot or cold, are by definition statistical anomalies. All anomalies are not predictable or sustainable. This gets compounded by someone like McClaren because when a streak suddenly matches his expectations he overvalues it.

    Little Mac wants his veterans to succeed because he likes them. Geoff Baker also likes them and wants them to succeed. Mariner fans in general want them to succeed. It becomes a perfect storm of self-gratification.

    You can produce these statistical arguments all you like. I would be shocked if Ibanez, in particular, gets any significant downtime now even when the “streak” ends, if it hasn’t already.

  37. brian_sun on August 20th, 2007 9:48 am

    There are 41 games left this season. That’s roughly 140 AB that a full time starter will have. I don’t think what Raul did in his last 48 AB or what Vidro did in his last 115 AB warrents riding them out for the entire 41 games. Richie Sexson is a different story, since the guy hasn’t had ANY hot streaks in the first 121 games. The entire reason he’s still playing is because his 14M salary this year and next year. These last 41 games basically make or break your season. You can’t afford to wait for these veteran players to come out of their slumps when they get into one. I think you divide the 41 games into 4 10 game intervals. You can’t allow a guy to be cold for more than 10 games. Raul and Vidro have performed well, let’s give them the next 10 games. Richie Sexson has sucked for the first 120 games, stick Ben Broussard in 1B for the next 10 games…

  38. Dan W on August 20th, 2007 9:52 am

    I too look forward to the projection post. “Hot streaks” abound! Besides Vidro (sizzling) and Ibanez (scorching), Betancourt is 377/397/689 (en fuego) in August!. With 2 BBs!. Stay positive!

  39. Seth on August 20th, 2007 9:52 am

    I guess now that we’ve accepted that Ibanez was just having bad luck…what about Sexson? Is he deviating to the norm, or is he kaput? Or what?

    I mean, it’s not like there aren’t dozens of power hitters who have kept truckin’ after 32–Papi, Chipper Jones, Mags Ordonez, all slugging over .550 this year.

    Then again, Sexson falls into that “bad athletic skills” category like Alvin Davis and Don Mattingly, right?

  40. Tek Jansen on August 20th, 2007 9:55 am

    Dave, or anyone else who might be interested in answering, is there any way to predict future defensive performance. A lot of the clamoring for Jones to be put in LF rested in his defensive value. Do people see Raul’s catch on Saturday and the the Jones drops as predicitive of future defensive performance?

  41. Dave on August 20th, 2007 9:55 am

    guess now that we’ve accepted that Ibanez was just having bad luck…what about Sexson? Is he deviating to the norm, or is he kaput? Or what?

    Again, we’ll talk about this in the projection post, but that’s not what we’re accepting.

    We’re accepting that park of his decline was bad luck, but part of it was also an obvious (and totally predictable) age related decline in skills. If you look at the community projections for Ibanez from before the season, we all saw him taking a pretty significant step back from his 2006 season. There’s no reason to expect him to hit at his prior year levels.

  42. rrose on August 20th, 2007 9:56 am

    34

    The cases against Ibanez and Vidro weren’t based on recent drop-offs in performance (“recent” defined here as 2007). The primary issue with Ibanez in particular has been his liability in the field.

  43. Aaron on August 20th, 2007 9:56 am

    To expand on my #34 (because I’m never as clear as I’d like to be):
    “If you, like Geoff Baker did, believed at the end of July that Adam Jones was a better player than Raul Ibanez and should be taking the field everyday, then nothing that has happened on the field since then should change your opinion. Raul Ibanez isn’t any more likely to hit well tonight than he was three weeks ago.”

    But what if we only thought Ibanez was done because of this very same “recent performance” bias? Using the very same analysis you laid out, somebody could make a (weaker, but perhaps valid) case that a couple hundred ABs early this season shouldn’t diminish his 3-year average enough to come to the conclusion we all came to.

  44. Dave on August 20th, 2007 9:56 am

    Dave, or anyone else who might be interested in answering, is there any way to predict future defensive performance. A lot of the clamoring for Jones to be put in LF rested in his defensive value. Do people see Raul’s catch on Saturday and the the Jones drops as predicitive of future defensive performance?

    It’s all about repeatable skills. Range is a very repeatable skill. Jones’ drops and Ibanez’s catch should do absolutely nothing to change your evaluation of their respective defensive abilities.

  45. Bernoulli on August 20th, 2007 9:57 am

    I wrote up a nice long comment about Kevin Mench’s shoes, but it doesn’t look like it needs to be posted. Great job, Dave, and thanks for writing this.

  46. Jeff Nye on August 20th, 2007 9:59 am

    Personally, I still have the feeling (read: I make no claims of having supporting data) that Ibanez is teetering on the edge of the cliff.

    It’s interesting to me how really invested people are in the whole “hot streak” theory. I’ve seen many claims of “anyone who actually watches the games can obviously see that Player X is hitting the ball harder”, that don’t seem to be supported by any real scouting information.

  47. rsrobinson on August 20th, 2007 10:00 am

    If there was some kind of meter or crystal ball to determine when a hot streak ended I guess you could make an argument to bench Ibanez or Vidro when that point was reached. It still doesn’t make much sense to do it while they’re hitting well.

    USSM conventional wisdom on Vidro has been wrong now for five weeks and counting. Perhaps today is the day it starts being right, but who knows? What I do know, though, is that probably every person in that Mariners clubhouse believes that both Vidro and Ibanez are swinging the bat well and deserve to be in the lineup.

    You can’t take the human element out of baseball. You can’t bench players who are hitting well because of some statistical theory that provides an educated guess about how they MIGHT perform in the future. You do that and you leave players uncertain of what they have to do to earn playing time and can wreck the morale of the team. If players can play their way out of the lineup by poor hitting (unless your name is Richie Sexson) then players have to know that they can play their way into the lineup, too. It’s been that way in baseball for 100+ years and will probably that way for another 100 years.

  48. Dave on August 20th, 2007 10:03 am

    Way to just totally miss the point of the whole post and ignore the final paragraph to boot.

  49. BLYKMYK44 on August 20th, 2007 10:04 am

    - So as a manager how would this data be used in a practical purpose?? While it may be true that a hot streak is not predictive of the future there has to be some sort of reward for performing well…

    - Also, does that mean there are significantly less 10 game “hot streaks”…at what point should a fan consider a hot streak an actual reflection of probably future performance?? I would consider Adrian’s great 48HR year…for awhile it was just a “hot streak” and it just kept going and going. When does that transition actually happen?

  50. Jeff Nye on August 20th, 2007 10:06 am

    To address something that keeps coming up (that Dave touched on but I think warrants repeating):

    The reason that the early-season “cold streaks” by Ibanez and Vidro are given a bit more weight, is that they fit with what we know pretty well about how baseball players’ skills decline with age.

    “Hot streaks”, on the other hand, are inherently a deviation from a player’s normal aging curve, and thus should be viewed with more skepticism than a “cold streak” that is more likely to be the onset of skill decline in an older player than an isolated instance.

    The same thing can be applied to younger players that go on a “hot streak”; a short-term improvement in their skills is more likely to be a harbinger of them increasing their skillset than it is to be an isolated instance.

  51. Dave on August 20th, 2007 10:06 am

    So as a manager how would this data be used in a practical purpose?

    A good manager would never start Raul Ibanez against a left-hander, much less hit him cleanup, simply because he’s “on fire” right now.

  52. scraps on August 20th, 2007 10:06 am

    rsrobinson, the human element has been acknowledged over and over again. It’s been acknowledged right here in this thread, by Dave and others. We agree that it is at least difficult to sit Ibanez or Vidro while they are streaking.

    Do you understand yet what people have been trying to tell you about hot streaks not having predictive value? It doesn’t seem like it, since you continue to say that USS Mariner “conventional wisdom” has been “wrong”. That makes as much sense as saying that someone is wrong if they tell you a guy is a bad hitter and the guy hits a home run. What Dave is writing is analysis. “Conventional wisdom” is in fact what you keep espousing.

  53. Bernoulli on August 20th, 2007 10:08 am

    So as a manager how would this data be used in a practical purpose?? While it may be true that a hot streak is not predictive of the future there has to be some sort of reward for performing well…

    Why? Because they won’t play well unless they’re given a cookie?

  54. Goob on August 20th, 2007 10:09 am

    Great post, Dave. I’ve got a question for ya though. You sometimes hear that players perform better when their job is on the line, kinda similar to the old “it was a contract year” argument used by some to explain Beltre’s 2004 performance. Do you know of any study that compared before and after results of struggling players who had a prospect or new acquisition come in that threatened their playing time?

    You hear it a bunch in football when a veteran QB is brought in to challenge the rookie in an effort to “raise his playing level.” A few buddies of mine say this is why Ibanez and Vidro are suddenly playing better, because they know if they don’t, Jones will take their spot. It’s maddening listening to them! But I guess it’d be interesting to see if there’s any credence to it.

  55. Manzanillos Cup on August 20th, 2007 10:16 am

    USSM conventional wisdom on Vidro has been wrong now for five weeks and counting.

    You mean the consensus here that Vidro is basically a singles hitter who draws a few walks but has no value in any other facet of the game? Hmmm, don’t see how that’s been wrong.

  56. rsrobinson on August 20th, 2007 10:17 am

    Alright, predictive power makes sense when rolling dice because, given enough rolls, you can predict within a very narrow degree of probability how often each number comes up. When it comes to humans projecting future performance is basically just an educated guess because there are too many unknown variables involved.

    A hitter might actually be performing better, not because he’s hot, but because he’s made mechanical changes in his swing that improves performance. Or he’s recovered from injury and is healthier. Or he’s simply improving over time to more fully reach his potential.

    And is there anything in that study that measures the length of a hitter’s streak? If one guy is hitting well for 20 games, another for 30 games, and another for 40 games then the study does little to accurately predict future performance after either 20 games or 30 games.

  57. zzyzx on August 20th, 2007 10:19 am

    50 – the problem with that though is that it could easily create models where we overestimate young players and underestimate old ones. At some point every old player will eventually lose the skills that let them play in the majors, but we have to be careful that we don’t spend so much time looking for that that we overreact ourselves.

  58. CCW on August 20th, 2007 10:20 am

    STRAW MAN ALERT: “Baker believes in the predictive power of the hot hand.” Did Baker say that? I don’t think it’s true. What Ibanez’s recent hot streak has demonstrated is that he is probably not as cooked as we – you, me, Baker, the USSM community – thought he was.

    Honestly, didn’t you think there was a pretty good chance that Raul had completely fallen off the cliff, never to return? Me, too. He looked *awful* at the plate. And with the distinct possibility that Raul was truly cooked, it made a lot more sense to give Jones his at-bats. But the past 50 at-bats have changed that belief. I now think it’s less likely that Raul will post a .687 OPS going forward than I did previously, and that is perfectly logical, rational belief. Dave, if you honestly think of Raul right now the same way you were thinking of him 50 ABs ago, I would be surprised. And I’d argue that doesn’t make any sense.

    Anyway, I don’t disagree with your point about hot streaks and cold streaks, but I think, as it relates to the M’s and Baker’s posts, it is a straw man set up to refute an argument that no one is really making.

  59. Jeff Nye on August 20th, 2007 10:24 am

    zzyzzxxzzrrrr (sorry, just poking fun):

    That’s a fair caveat, and I think it’s pretty obvious that none of what we’re talking about here is meant to be 100% exact (well, obvious to everyone but one person), but I’d definitely prefer to err on the side of overvaluing young players than old ones, because in the main you’re more likely to be right when you project a sudden uptick in performance for a younger player than an older one.

    There’s been a lot of good research done on aging curves for baseball players, and everything I’ve seen indicates to me that it’s a lot more likely that someone will “figure it out” at 22 than it is at 38.

  60. gwangung on August 20th, 2007 10:24 am

    A hitter might actually be performing better, not because he’s hot, but because he’s made mechanical changes in his swing that improves performance. Or he’s recovered from injury and is healthier. Or he’s simply improving over time to more fully reach his potential.

    So, how do you tell?

    Alright, predictive power makes sense when rolling dice because, given enough rolls, you can predict within a very narrow degree of probability how often each number comes up. When it comes to humans projecting future performance is basically just an educated guess because there are too many unknown variables involved.

    Except, you just AGAIN ignored what Dave posted.

    Given the data there, those “unknown variables” DON’T MAKE A DIFFERENCE in future performance. That’s an empirical observation.

  61. lailaihei on August 20th, 2007 10:26 am

    Hey Dave, what predictive stats do you like to use? You say 3 years max, is that a .5/.3/.2 split or what?

  62. Steve T on August 20th, 2007 10:27 am

    Thanks for a great post, Dave. Humans love seeing patterns where none exist; it’s one of our defining characteristics. So is the ability to actually do the research to correct overinterpretation. Sometimes the animals in the clouds are just clouds!

  63. fetish on August 20th, 2007 10:30 am

    [no]

  64. davepaisley on August 20th, 2007 10:33 am

    13 – “if you’re allowing 48 at-bats to significantly alter your projection, you’re overvaluing current year data in lieu of prior year data.”

    So the fact that Ibanez’ last three years (his second career in Seattle) are:

    2004 .825 (month splits from .645 to 1.167)
    2005 .792 (.688 to .851)
    2006 .869 (.739 to 1.109)

    2007 .788 (.503 to 1.333 (Aug incomplete))

    …should show us that he’s simply regressing up to the mean (currently sitting at .788)

    He’s unlikely to maintain 1.333 for August, but should finish over 1.000 at least. So he could have the worst and best months of his second Mariner career back to back.

    Allow a little bit for age related decline and he’s well within the range of expectancy so far this year.

    There is significant evidence that Ibanez has been nursing nagging injuries (probably well known by the coaching staff). The likelihood is that the injuries will recur sooner rather than later, but he could well dodge a bullet and be productive the rest of the year.

    Vidro’s a little trickier, in that his performance boost is entirely due to hitting over .400 for an extended period of tiem – something that has proven to be impossible for anyone to sustain over the long term. So we *ought* to know that Vidro’s run is entirely luck, especially as it’s all singles.

    Actually, that would be an interesting set of streaks to look for – how long have any hitters been able to sustain a streak of .400+ hitting?

  65. zzyzx on August 20th, 2007 10:34 am

    59 – don’t get me wrong, I share your bias. However, I don’t know if we know enough to know the difference between a slump and falling off the cliff. Was the probability that Raul was effectively done on 7/31 10% or 50%? If I were in a major league front office, that’s what I would be trying to research, because if you had an advantage in knowing the odds that a player was (or wasn’t) done, that could be a huge advantage.

    Hey, seeing how we root for a team that overvalues vets, it would be to our advantage to get conventional wisdom to undervalue them so we’d get better deals. Hmmmmmmmmmmmmm *rubs hands together*

  66. whwang on August 20th, 2007 10:38 am

    What I read from this post is, hot streaks have very little (if at all) predictive power, comparing to the long-term average of the player’s pasr record. I totally agree this.

    However, in the specific case case of Ibanez this year, while I don’t think we should take his recent hotness too seriously, we also need to ask which part of his past record we should trust. The 05-06 Ibanez? Or the first-half-season Ibanez in 07? If the re-July 07 Ibanez performance is caused by injury, can we view the recent hot strak of his a sign of improved health and more or less expect him to perform like the pre-07 Ibanez?

  67. Jeff Nye on August 20th, 2007 10:41 am

    I don’t want to get too much into specifics about certain players, because Dave asked that this thread not turn into that and it’s already veering dangeously close. (Dave, if you want to squash this post to prevent further derailing, I don’t mind)

    Suffice it to say that I always was a proponent of keeping Ibanez’s bat in the lineup, just at DH, and moving You Know Who to the bench.

    Most of the current knock on Ibanez is that he isn’t a good defender anymore, and I saw more evidence of this at the game yesterday, watching him have to expend a huge amount of effort to make a couple of plays that would’ve been routine for someone who can actually run. His bat, however, still has value.

    So, bringing it back to the topic at hand, I’m not sure that there was ever a huge argument that Ibanez’s “cold streak” was indicative of him no longer being able to hit the baseball. I think he’s still going to be a reasonably productive major league hitter in the scope of this entire year.

    Most of why many of us wanted to see AJ in left had more to do with the defensive gain than offense, and I’m personally not prepared to discount his track record of excellent outfield defense in the minor leagues based on a couple of bad plays in the majors.

  68. Chris Miller on August 20th, 2007 10:46 am

    I think Raul is in clear decline, but I think part of it is good old Regression to the Mean. Sexson will probably do the same, come back, but with some decline. Sometimes a streak can last an entire year. Adrian Beltre was hot for 04 then regressed, as well Mike Lowell was awful 05, then regressed.

    Keep in mind ALL years and careers are composed of streaks that create an average. Sometimes the streak only lasts 1 PA, but overall sombody’s averages are the result of a series of streaks. Good spells and bad spells weigh each-other out. Ichiro loves to alternate between Incredible and Mediocre.

  69. rsrobinson on August 20th, 2007 10:46 am

    After re-reading Dave’s post I’m still confused about whether the study is limited to measuring performance after seven games or if it only began measuring once it was identified that players had “cooled off”. I see problems with both.

    If some players cool off after seven games while others cool off after 15 or 20 or 30 games, then the study does little to predict future performance after, say, ten games.

    And what happens when a hot streak goes well past seven games, as it has with both Vidro and Ibanez. Does that mean that factors other than just being hot are involved? Or is that considered a failure to accurately predict future performance, at least over the short term?

  70. Chris Miller on August 20th, 2007 10:50 am

    I think an entire season is not enough data to make a true talent judgment on, let alone a month or two. More often then not, if someone heats up way above their expected level, even for the whole year, the next year they regress. Sometimes there is a skills based component, and they don’t regress as far, but how many times has a guy just had a career year, then never did it again? Raul 06 comes to mind.

  71. Chris Miller on August 20th, 2007 10:52 am

    Even if the skills have clearly changed (ie, a guy is REALLY getting wood on it, hitting the ball a mile out of nowhere), I’d still be leary until he did it for a year (or more).

  72. Mike Honcho on August 20th, 2007 10:53 am

    I think the question that needs to be asked is: how Tango and Co. calculated “expected” w/OBA?

  73. Chris Miller on August 20th, 2007 10:57 am

    Probably regressed statistics from before the streak, w/o looking.

  74. Chris Miller on August 20th, 2007 10:58 am

    A hunch would be something akin Marcels. But I’m not them, can’t speak for them directly.

  75. ghug on August 20th, 2007 11:02 am

    It is possible that Ibanez had a long cold streak (possibly due to injury), and then a hot streak, and now he will return to normal (we can hope can’t we). If you want to project performance accurately, in my opinion, you have to take into account age, several seasons of stats, and most importantly skillset. The book shows that, somewhat.

  76. BLYKMYK44 on August 20th, 2007 11:14 am

    Would anybody be able to define when a hot streak becomes actual performance??

  77. Chris Miller on August 20th, 2007 11:21 am

    I’d think regression and scouting are the keys to understanding change in performance.

  78. smac on August 20th, 2007 11:21 am

    Dave,
    One question I have after reading this is, do you buy into hitting players against pitchers they hit well/ sitting them against pithcers they have hit poorly. i.e. Raul vs. Santana (I couldn’t believe he was in the line-up for that game, and I couldn’t believe he had such a nice average vs. Santana). Are those face to face numbers just small sample size/hot streak that should be given no credence, or would you argue that those become predictive?

  79. Jeff Nye on August 20th, 2007 11:26 am

    Well, that’s the million dollar question.

    I don’t think there’s any way to draw a line in the sand saying “this is when it stops being a fluke and starts being a change in expected future performance”.

    I think Dave touched on the topic briefly earlier in the thread, though, when he said that you need to evaluate the skills rather than the results, with the implication that looking at just results will never allow you to tell the difference between those two things.

    So basically, the idea is that you ignore the results-based “hot” and “cold” streaks, and use scouting and carefully selected stats that tell you useful things about a player’s skillset to try to identify the difference.

  80. Chris Miller on August 20th, 2007 11:27 am

    #78, the book covers that too, and the answer is there’s no predictive value in batter pitcher matchups either, beyond the expected results based on the batter, pitcher, handedness, and batted ball tendancies (ie, a groundball pitcher against a slow guy) of each.

  81. DMZ on August 20th, 2007 11:28 am

    Generally speaking, face to face numbers have no value.

    That doesn’t mean that you can’t play matchups — Earl Weaver talks about that: if you have a guy who is particularly good at turning on fastballs, you want them up against a reliever who doesn’t have a decent breaking pitch.

  82. arbeck on August 20th, 2007 11:29 am

    It seems like 90% of the questions asked in comments on the site could be answered by just reading The Book. Either everyone needs to buy a copy, or I’m going to have to start carrying my copy around so I can refer to it all the time.

  83. Dave on August 20th, 2007 11:29 am

    When it comes to humans projecting future performance is basically just an educated guess because there are too many unknown variables involved.

    An educated guess is better than an uneducated guess, no? No one’s saying that our ability to project human performance is perfect. We’re just saying it’s better than anything else in use at the moment.

    Everyone’s projecting performance – John McLaren, Bill Bavasi, Geoff Baker, you – everyone. We’re just getting there in different ways, and the accuracy of our projections will be effected by the process we use to come up with our expectations. I think history shows that using statistics correctly, and understanding which ones matter and which ones don’t, will give greater accuracy than betting on intangible myths.

    If the re-July 07 Ibanez performance is caused by injury, can we view the recent hot strak of his a sign of improved health and more or less expect him to perform like the pre-07 Ibanez?

    Again, can anyone give me a reason why August 2nd Raul Ibanez was too hurt to be effective but August 3rd Raul Ibanez is Super Awesome Power Hitting Raul? The injury idea, which may or may not be true, is predicated around using performance to figure out when he got healthy, then suggesting that health is the reason for the performance. It’s classic circular reasoning.

    After re-reading Dave’s post I’m still confused about whether the study is limited to measuring performance after seven games or if it only began measuring once it was identified that players had “cooled off”. I see problems with both.

    The data identified any five game stretch (with 20+ PA) where a player perfromed at a high level, regardless of what followed immediately afterwards. I hoped the Magglio Ordonez example made this clear – the players were not selected as guys who hit well and then cooled off.

    That’s why there are 6,000 5+ game samples and 543 players – there are many instances of the same player, some of them overlapping. Magglio Ordonez’s 10 game hot streak would include six different five game hot streaks (1-5, 2-6, 3-7, 4-8, 5-9, 6-10), all of which would be placed into the hot bucket.

    And what happens when a hot streak goes well past seven games, as it has with both Vidro and Ibanez. Does that mean that factors other than just being hot are involved? Or is that considered a failure to accurately predict future performance, at least over the short term?

    Will you just never allow random variation to enter your mind as a cause for anything?

    I think the question that needs to be asked is: how Tango and Co. calculated “expected” w/OBA?

    They took all the players in the hot bucket, created an average based on their historical three year totals and weighted them by plate appearances. Without just totally appealing to authority, there aren’t too many reasons to believe that any of us here know how to conduct a baseball research study better than those three.

  84. darrylzero on August 20th, 2007 11:31 am

    Dave said a couple of times that he thought Raul was probably finished as a power-hitter. I think if we’d asked him how sure of that he was, he probably wouldn’t have felt too sure, but very nervous, particularly because he knew it would take the Mariners a very long time to realize if he were really done. I was mostly in the same boat, though I have much less confidence in my own analysis than he does (for very good reason).

    He’s already said mea culpa about overvaluing 2007 with regard to Ibanez as a hitter, and I think he’s laid that out clearly and sensibly in the comments here (i.e. we should still expect some decline due to age, but we can’t just ignore last year either). But he hasn’t spelled out the skillset issue with regard to how that would play with Ibanez, which I think that has been the biggest part of what has been driving his analysis. The statistical consensus about how bad Ibanez is in the field is overwhelming, and he hasn’t hit lefties well in a long time (and not much at all since becoming a legitimately good hitter of RH pitching).

    So, Raul’s skillset is platoon DH. He’s the important half, he’s well-suited as a hitter to his home park (though not as a fielder, obviously), and if he gets back to what we might have expected from him this year before the season began, he’ll be a very very good platoon DH. I hope so. That makes him better than Ben Broussard. But not overwhelmingly better. They’re more similar players than we might want to admit. Certainly the drastically different usage pattern is a little odd, given the similarity.

    Also, fetish, I don’t know if you’re being sarcastic, but it’s worth remembering that Jones played well in center at AAA all year this year. We have a lot more data to look at than just his appearances with the Mariners that suggests that his defensive miscues will probably not continue. I’ll admit to being nervous about them myself, but we should acknowledge that it’s not a very reasonable nervousness.

  85. Chris Miller on August 20th, 2007 11:33 am

    There’s lots of information to help you make those kinds of judgement. If a guy is hitting for serious average out of nowhere, and his LD% is 27%, there’s a good chance it’s a complete fluke since LD% is fairly random, and those kinds of rates are unheard of for extended periods. If HR/F is spiked, you could visit Hittrackeronline.com and view the HR’s, a buch of “Just Enoughs” would indicate HR’s are barely clearing the fence. If there weren’t that many of them and/or a bunch of “No Doubts”, and that continues for an extended period (not sure the actual regression to apply in that case), then maybe it’s not completely a fluke. If a guy alters his swing to pull it more, so he can clear a short porch at home (Raul 06?), then maybe that can be maintained for some period. I’d be weary of drawing conclusions too quickly, even in the face of visible changes, because some of the things can’t be easily regressed, which is where scouting comes in. Sometimes Changes in mechanics can lead to permanent change going forward, but sometimes people rever to old habits, or get exposed to holes and get forced to go back to what it was they were doing.

  86. carcinogen on August 20th, 2007 11:34 am

    Baker has responded:

    http://blog.seattletimes.nwsource.com/mariners/

    He brings the dialog from the abstract to the specific W/R/T Sexson, Ibanez, and Vidro. I look forward to Dave’s reply.

    I would also like to comment on how lucky we are that the Ms discussion in this town has become what it has as compared to years past.

  87. Dave S. on August 20th, 2007 11:35 am

    There’s a certain amount of subjectivity that has to go into this argument – when to determine whether this represents a real change in performance. And there’s also the understanding that, while the hot hand may not directly mean anything, it’s kind of difficult to bench a player when he’s producing.

    Reality is: we’ve got one day off in the next month and a half. Vidro and Ibanez have started to hit. Jones will play, because our players will definitely need rest.

    But I’m going to find it very hard to fault McLaren for rewarding production from his regulars.

    Sticking with Richie Sexson, on the other hand? Stubborn and moronic.

  88. JMHawkins on August 20th, 2007 11:35 am


    My own personal thought is that you need a pretty substantial sample size before you can start saying that someone has “turned it around”, on the order of a half season if not more. Two weeks doesn’t tell you anything.

    One of the things that I’ve been gravitating towards for a few years, and am now firmly in the boat of, is evaluating changes in skills rather than results. I believe that any sustainable deviation in results will be the biproduct of a change in skills, and through a better understanding of what statistics to look at as well as quality scouting information, I think we can identify skills changes that will allow us to see what players have actually improved, rather than which players are just riding a nice wave.

    What if Raul’s current performance can be attributed to recovery from injury or the correction of bad habits developed as a result of compensation for injury? Do you think a distinction can be made between what is seemingly a random spike in performance and increased performance having an underlying cause?

    Sure – the injury factor is one of the main legitimate causes for deviation in performance. However, I think we need to be careful in just randomly assigning the injury excuse to any player who suddenly sees a change in performance. What changed with Raul Ibanez on August 3rd that wasn’t true on August 2nd, for instance? Or how is his claim that he’s finally healthy this time any different from the one he made after hitting two home runs in Cleveland back in June?

    To tie the two together, if Raul’s recent improvment is due to recovering from an injury, there should be some visible improvement in a skill that we can measure. The key is finding the right measurements. Perhaps it’s something like line drive percentage or HR/FB rate, or something that we already have access to. Or perhaps it’s average speed of the ball off the bat, or something of that nature that isn’t readily available. Despite over a century of baseball statistics, we’re still evolving new and better ways to measure results.

    My personal take is that Raul probably is recovering from at least two injuries – shoulder and hamstring – and that he is hitting the ball better due to actual improved skill. Earlier in the year, his problems seemed to be more a lack of power than a lack of ability to see or make contact with the ball. Now, it appears the ball has a little more giddyup when it leaves his bat. Shoulder + hamstring injuries and loss of power. Hmmm, it’s reasonable one might cause the other. Maybe in his earlier “recovery” he reinjured the shoulder and fell back into the hole?

    But notice the words “seemed”, “appears” and “maybe” in the above. I don’t have the right data to say “was”, “is” or “did”. I’m just guessing.

  89. Mike Honcho on August 20th, 2007 11:35 am

    They took all the players in the hot bucket, created an average based on their historical three year totals and weighted them by plate appearances. Without just totally appealing to authority, there aren’t too many reasons to believe that any of us here know how to conduct a baseball research study better than those three.

    Thanks. I was curious as to how far back they went.

    Geoff Baker’s response to this post is up. He says he agrees with Dave, but then completely misses the point when he makes his argument for keeping Sexson in the lineup. And he does the same to a lesser extent for Ibanez and Vidro.

  90. Chris Miller on August 20th, 2007 11:36 am

    #81, good point, it’s kind of what I was getting at (but not specifically), platoon situations.

  91. darrylzero on August 20th, 2007 11:37 am

    Maybe another hypothetical example would get us off the Ibanez tip for a minute, which might allow us to think a little more objectively. If you’d rather not go down this road, feel free to delete or just not respond, but I’d like to see a brief outline of how you would apply this to evaluating the prospects of J.D. Drew hitting this year. Maybe that should wait for the projection post, though. Apologies if so.

  92. Chris Miller on August 20th, 2007 11:38 am

    Raul has been injured, or at least has said he has been, so I suspect that’s part of what was going on.

    Sometimes players who aren’t playing well will just say “this and that” caused the slump though.

  93. Chris Miller on August 20th, 2007 11:40 am

    #91, you DO want to weight the most recent performance, more than say 3 years ago, just not as much as people do. I think JD Drew is not as good as the guy Boston picked up, but not as bad as this season suggests.

  94. Chris Miller on August 20th, 2007 11:41 am

    I think JD Drew is not as good as the guy Boston picked up

    Should read:
    I think JD Drew is not as good as the guy Boston THOUGHT THEY picked up

  95. carcinogen on August 20th, 2007 11:48 am

    89: Dave is using a very common analytical approach. He’s gotten Baker to agree with a proposal in the abstract (the jab), next he’ll hit him with the specific examples (the right cross), and Baker will have to relent.

    Moooohaahahaaahaaaa….ok, so I overstate. But damm I love this stuff!

  96. Dugan on August 20th, 2007 11:48 am

    OscarM Says: … You can produce these statistical arguments all you like. I would be shocked if Ibanez, in particular, gets any significant downtime now even when the “streak” ends, if it hasn’t already.

    Amen – IMHO, Johnny Mac has decided that Ibanez is the better option and Jones is going to ride the pine most games.

  97. Jeff Nye on August 20th, 2007 11:49 am

    Yeah, I wasn’t that impressed with Geoff’s response myself, despite him having the best first name ever (it’s misspelled, though!)

    He seems entirely too willing to dismiss You Know Who’s last two awful years as being entirely due to injury rather than decline, while believing that his post-ASB “hot streak” (which I’d define as more of a lukewarm streak) is predictive of his near-future performance.

    The tone of his post also implied a disdain for AAA statistics, in regards to Adam Jones, which I can’t quite fathom.

  98. DMZ on August 20th, 2007 11:50 am

    As Dave requested — could we perhaps not make this about the M’s, or even the particular players in question, or Jones v Ibanez or whatever?

    The post is about the predictive value of hot streaks. All other questions will be answered in due time.

  99. Robo Ape on August 20th, 2007 11:52 am

    First of all, great post Dave; probably my favorite of the season if only because, as an anthropologist, I’ve read countless papers about this phenomenon and it’s (typically incorrectly used) role in human decision-making. There are a couple of things I want to bring up, however.

    To begin, the most famous example from the academic canon investigating the phenomenon of human beings erroneously predicting future success based on perceived recent success is Gilovich, Vallone, and Tversky’s “The Hot Hand in Basketball: On the Misperception of Random Sequences.” (Cognitive Psychology, 1985). Fundamentally, the argument in the paper is the same you are making here but with an important distinction; specifically, that the concept of hot streaks in basketball is a fallacy in and of itself. Other than extreme outliers, a basketball “hot-streak” does not truly exist in the way sports fans perceive it to. In basketball, the streaks (either hot or cold) are rarely sustainable for any significant portion of time. As a result, the tendency for basketball teams to give the ball to a player who is “hot” is a negligible or detrimental decision. As JLC says in 33: “The… human tendency is to see patterns, even when there are none, and that screws us up all the time.” This is so, so true.

    That said, I know that numerous cognitive psychologists, economists, and social anthropologists, after reading the Gilovichh, Vallone, and Tversky paper attempted to apply the argument to baseball but ran in to a problem. In baseball, unlike basketball, true hot and cold streaks do seem to exist. This creates a massive problem for predictability because of the moving window effect. Since statistically significant hot-streaks do occur is baseball, on a fundamental level at least, when a player is “hot,” fans are often correct in thinking he will continue to do well. They are just as often, of course, incorrect. The trouble comes from where a hot-streak begins and ends. We might have, for instance, decided to bench Raul after his hot streak prior to the most recent home-stand. So far as I can tell, that would have been a mistake. Of course, a hot streak can’t last forever, but the point is that players do get hot, which even this analysis acknowledges.

    I’m not arguing against anything you’ve said, really, but I think this is an important distinction to consider when analyzing the hot hand fallacy in baseball as opposed to other situations.

  100. panman on August 20th, 2007 11:55 am

    Very thoughtful post as always. But your argument boils down to “regression to the mean” which if applied to Ibanez would indicate his performance across the remaining 6 weeks should be enough to offset his statiscally aberrent perfomance for most of the season and deliver a set of results consisten with his previous five seasons. So the conclusion is; play Raul…….

  101. DMZ on August 20th, 2007 11:56 am

    I don’t believe that’s true. There hasn’t been, to my knowledge, a well-done study that shows any kind of hot/cold streak effect in baseball.

    I’ll again refer everyone to “Curve Ball” by Albert & Bennett, which is entirely about this sort of thing and how random variation can produce what appear to be real hot and cold streaks (but are not).

  102. Rusty on August 20th, 2007 11:56 am

    I was sitting in the RF bleachers for the ninth game of Ken Griffey, Jr’s consecutive home run hitting streak. Just shrugged my shoulders when the game was over and I didn’t get a chance to scramble for a ball.

  103. DMZ on August 20th, 2007 11:56 am

    His argument doesn’t boil down to that at all.

  104. Edman on August 20th, 2007 11:57 am

    As was illuded to earlier…..GOOD statistics don’t eliminate the good data, if is doesn’t ALSO eliminate the bad data.

    Since the idea seems to be that hot streaks are a fluke, then so must cold streaks. If you want to play the SPC game, eliminate both, on either side of the equation (ie: 20% good and bad) leaving you with a result that shows you TRUE average performance.

    It’s silly to eliminate only the good performances……and keeping the bad ones……..and think you have some form of legitimate evaluation tool. IMHO, it’s simply making up statistics that point in a direction you want them too…..to eliminate objectivity.

  105. DMZ on August 20th, 2007 12:04 pm

    I don’t understand that at all.

  106. argh on August 20th, 2007 12:05 pm

    I’ve got ‘Curve Ball’ coming at the library and I see ‘The Book’ is available at local bookstores — which one would make the most sense to read first?

  107. Jeff Nye on August 20th, 2007 12:05 pm

    In the spirit of moving this discussion in the direction of non-Ms specific things:

    Does anyone have a good example of a player who was on an extended “hot streak” and then regressed to their expected sucky performance?

  108. Graham on August 20th, 2007 12:07 pm

    Very thoughtful post as always. But your argument boils down to “regression to the mean” which if applied to Ibanez would indicate his performance across the remaining 6 weeks should be enough to offset his statiscally aberrent perfomance for most of the season and deliver a set of results consisten with his previous five seasons. So the conclusion is; play Raul…….

    Ok, ‘regressing to the mean’ doesn’t actually mean that results from here on out are expected to even out with what we’ve been getting beforehand.

    Say you have a fair coin, you flip it 10 times and get 10 heads.

    Regression to the mean would say that the next 10 flips would result in 5 heads and 5 tails, NOT 10 tails just to even things out.

    With baseball players, it’s not quite as simple as coin flipping – you have to work out where the mean is and how much to regress numbers by, but you don’t go, “he’s performed badly, I expect him to be AWESOME from here on out,” you say, “he’s performed badly, I expect him to perform closer to career norms from here on out.”

  109. Jeff Nye on August 20th, 2007 12:10 pm

    Edman, I think I addressed the point you’re trying to “illude” to in my post #50, not to toot my own horn too much.

    Cold streaks, when they fit with what we know about how players’ performance decline with age, should be given more weight than hot streaks that go counter to the trend.

    Vice versa with younger players; hot streaks with younger players have a better likelihood of being meaningful than their cold streaks.

  110. Tropics iRE on August 20th, 2007 12:11 pm

    i know this sounds silly.. but when i have a hot player on my fantasy team i try to bench him…

    I know this doesnt really apply in Real-World Baseball, but i think its one of the quirks of pro-sports, and juggling #’s.

    1st reason i bench the hot player… my expectations are too high for that player to be able to satisfy my expectations for the next weeks games.

    2nd reason i bench a hot player (totally not based in reality)… i bench the hot players in fantasy because i almost always have a comparable player on the bench, and so i choose to allow my team to potentially score less points knowing that even if that benched hot player continues to play hot, none of my opponents get points for him.

    so far its helped me win lots of fantasy championships, but that means nothing…

    as far as real world evaluation of a player and predicting how they will perform… i tend to just trust my gut … i know its not so high tech… but i seem to have a great ammount of success using my gut feelings.

    -Ti

  111. Ace on August 20th, 2007 12:11 pm

    Over the last three years (2004-2006) Ibanez’s numbers against left handed pitchers are .269/.329/.406. His numbers this year are .278/.303/.365. Obviously not great, but to make the case that Raul should never play against left handed pitching seems a little extreme. A few more days off in the coming stretch would be a good idea, and if they come on days a left handed pitcher is on the mound, fine, but if they face three left handers in a week, there is no way Raul should sit all three days.

    Just like in any job, a player needs to know what he has to do to be successful. When a player does well you need to reward that success. I remember a few weeks ago when Santana was on the mound and Dave said that whatever Raul’s previous success against Santana, Raul should sit that night. Raul played and was 1-3 with a run, in a 4-3 win. A good night for any ballplayer against Santana. And I am sure the next time they play against Santana, Raul will be rewarded with another start. And that is how it should be.

  112. Ralph Malph on August 20th, 2007 12:17 pm

    The study does show a correlation between the 5 game hot/cold streak and short-term future performance. It is just an extremely weak correlation (.004 wOBA).

    I would expect if you used a shorter streak length (1 game, say), you would see an even weaker (or perhaps undetectable) correlation, due to sample size factors.

    But conversely, if you used a longer streak length (10 games, or 20 games, or whatever) I would expect to see a stronger correlation. If you went to 1 season, or 2 or 3 seasons, you’d be looking at a long enough period that it would have a much better predictive value.

    I would predict that the longer the streak length, the better the correlation to future performance.

    To me the interesting research would be to determine if this is true and, if so, what length sample is necessary to have a statistically significant predictive value. 6 weeks? If so, Vidro’s “hot streak” is worth considering in setting lineups.

    I don’t see an answer to that in the Tango et al study. Though admittedly I haven’t read the book. Does anyone know if research of this type has been done?

  113. Jeff Nye on August 20th, 2007 12:20 pm

    I’m not so concerned about “the human factor”, and I’d suggest that none of us should be as concerned about it as some folks seem to be.

    One of the very few actual pieces of work that a major league manager has to do is to be able to massage egos in the clubhouse; if your manager isn’t able to effectively deal with a player who feels, based on a small sample size, that he has “earned” playing time, then you need a new manager, because his lack of that skill is interfering with the ability of the team to put the team with the best chance to win on the field.

    It really is that simple.

  114. Chris Miller on August 20th, 2007 12:20 pm

    You do want to weight the most recent performance. 1 year of change does change the projections, as does 1/2 year, the problem is using that 1/2 or 1 year AS the projection, instead of weighting the performance into the projection.

  115. Chris Miller on August 20th, 2007 12:22 pm

    Tango uses a 5/4/3 weight for the last 3 years data: http://www.tangotiger.net/archives/stud0346.shtml

  116. DMZ on August 20th, 2007 12:22 pm

    There’s no good evidence that longer streaks have more predictive value.

  117. sparky on August 20th, 2007 12:22 pm

    One thing I wonder is if there is a way to disentangle the causes of the hot streak and their potential implications for future performance. Overall, the numbers indicate that hot streaks do not predict future performance. However, it may be that certain types of hot streaks do. Consider two hypothetical types of streaks:

    1. The player has been hitting the ball hard all year (e.g., consistent line drive %), and it just so happens that a few extra hits fall in over a 5 game span resulting in a OBP/SLG spike.

    2. The player got injured in the off-season, and the injury resulted in a change in his batting stance. Several months into the season, the hitting coach notices this change and helps the player correct it. It seems possible that the resulting change could generate a sudden shift in production.

    In the first example, there’s no reason to believe the spike is sustainable. However, a fundamental change (e.g., batting stance, recovery from undiagnosed injury) in approach might be more sustainable.

    Assuming this premise is true, the question becomes how to identify the different types of streaks. It seems the best way to do this would be to use batted ball data. Did Tango et al control for shifts in LD%, FB%, etc. in their model? (and, on a slightly different note, did they control for change in level of competition…a few games where a batter is facing Ho-Ram, bad Weaver, and Washburn in a hitters park is a hot streak waiting to happen for any player).

    This is a lot of assumptions, and the data presented above fly in the face of most of them. However, it would seem that not all hot streaks are the same, and it is unlikely that simple luck is the deciding factor in all of them.

    As I read this over, I actually think my real question is about the best way to quantify/compare hitting mechanics (and, by extension, the impact of hitting coaches).

  118. Joel on August 20th, 2007 12:25 pm

    Excellent post. I think DMZ touched on this above, but using the logic of predictive hot/cold streaks, does this mean that batter vs. pitcher “hot or cold” streaks are meaningless as well? I’ve long thought it ridiculous when I see a stat come up claiming a batter is “15 for 23 vs. pitcher X” as an example, and think this is a repeatable “skill”, the ability to hit well against a certain pitcher. I’ve always believed these face to face matchup statistics were meaningless (unless you factor in abilities, as DMZ mentioned)

    I guess what my question boils down to is this:

    Have there been any players who have performed so well against a certain pitcher, in a large enough sample size, to show they “own” (for lack of a better word) them? That they hit significantly better against one specific player?

    Thanks guys and keep up the great work.

  119. Robo Ape on August 20th, 2007 12:26 pm

    Derek RE: 101

    Maybe I’m mistaken, but is Dave’s analysis in this post (taken from the book, granted) not in and of itself acknowledging the existence of streaks? I don’t have the papers in front of me any more, but I remember pursuing this question personally in graduate school and seeing that the reason no baseball specific hot hand follow up paper was written was because of the perceived potential existence of streaks, not in spite of it. I suppose it would be harder to demonstrate that streaks exist than it would be to show that they don’t, though I suppose the burden proof has to go that way.

    In any case, that was what I rememebered. I very well could be dead wrong.

    The irony of such a post would be both hilarious and embarassing.

  120. arbeck on August 20th, 2007 12:28 pm

    2. The player got injured in the off-season, and the injury resulted in a change in his batting stance. Several months into the season, the hitting coach notices this change and helps the player correct it. It seems possible that the resulting change could generate a sudden shift in production.

    This wouldn’t count as a hot streak according to the study being sited. If the injury caused a change in the batting stance that the coach noticed, it would be reasonable to assume that the hitter was hitting below his expected wOBA. Therefore, after making the change he probably approaches his expected wOBA. Remember that it is calculated using data for the last 3 years.

  121. IdahoInvader on August 20th, 2007 12:29 pm

    Sometimes I honestly wish I could see Dave’s and/or DMZ’s facial expressions as they type, when I read exchanges like in 47/48.

    I’m sure it would mirror what will occur on the mound for us tonight.

  122. DMZ on August 20th, 2007 12:31 pm

    Streaks exist. Streaks do not have significant predictive value.

  123. Ralph Malph on August 20th, 2007 12:32 pm

    is Dave’s analysis in this post (taken from the book, granted) not in and of itself acknowledging the existence of streaks?

    The question isn’t whether streaks exist. Of course they exist. When you flip coins, streaks exist — sometimes you flip heads 3 times (or 5 times or whatever) in a row. Sometimes batters get more hits, or fewer hits, over a 5 game stretch.

    The question is whether they mean anything — whether they have any predictive power — or whether they are simply random chance.

  124. joser on August 20th, 2007 12:33 pm

    I really need to finish the projection post, eh? A lot of these subjects will be covered in that.

    Yes, most definitely! This has nagged at me for a long time. We’re all (mostly) careful to note “small sample size” but what’s “big enough” sample size? Obviously when one AB is going to change the first digit of a player’s average, the sample is not large enough; conversely, when it’s only going to affect the 3rd digit, it very clearly is. But where is the dividing line? For the most-frequently cited stats, what’s sufficient sample size — and what’s too much?

    Because we know players change over time. Their careers follow a path, usually an arc of some sort, and we can probably group players into similar arcs based on their skillsets and body types. But players are individuals who can sometimes defy the obvious trends (and even age, for a time) and who sometimes suffer injuries that diminish their abilities or surgeries/rehabilitation that improve them. And then you have the real outliers (Moyer for example). With them, we just have to shake our heads in silent admiration and move on. But most players follow some kind of projectable path. So how do you make that projection?

  125. arbeck on August 20th, 2007 12:33 pm

    Joel,

    We are discussing chapter 2 of The Book and you are interested in chapter 3.

    Basically, the answer to your question is no though.

  126. Tropics iRE on August 20th, 2007 12:34 pm

    all we know about streaks is this….

    when its over (the streak) .. player x is no longer playing above expectations… (some call me a genius, but you dont have to if you dont want to) ;)

    here is to Ichiro going on the streak of all streaks.. and breaking the all time hitting streak record just because he is looking for a new record to break.

    -Ti

  127. sparky on August 20th, 2007 12:36 pm

    120. Good point. I guess maybe a better example would be some new input that results in an improvement above the performance over the previous 3 years. This is a lot less common. One example might be guys who get corrective eye surgery or use some newfangled contact (I remember this being cited as evidence for Brian Roberts’ power increase a couple years ago).

    Not likely I guess…but I wonder if there are numbers (or previous research) that can answer such a question.

  128. Joel on August 20th, 2007 12:38 pm

    Re: 125

    Thanks…looks like I gotta get me some of this Book.

  129. lailaihei on August 20th, 2007 12:40 pm

    Say I want to read a couple baseball stat books. The Book will be my first. What other ones should I order from my library?

  130. joser on August 20th, 2007 12:41 pm

    The question is whether they mean anything — whether they have any predictive power — or whether they are simply random chance.

    Exactly. When this ame up in another thread, several people cheerfully admitted to “playing the hot hand” when they were playing Strat-o-matic — knowing full well nothing was determining those streaks except random rolls of the dice. There’s no chance these (imaginary) players were seeing the (imaginary) ball better, or had adjusted their stance, or their swing, or whatever. It was pure, random chance. This is about as clear an example as you can find of our natural human biases to see patterns where there are none and to overweight recent experience.

    That said, it’s such a fundamental human tendency (seeing patterns is, at its most basic level, what our human brains spend almost all their time doing, which is why we can be so deeply fooled) that benching a player who is on a hot streak would seem to be such an unnatural act, and one almost no manager would dare try.

  131. arbeck on August 20th, 2007 12:43 pm

    sparky,

    Dave kind of addressed this though. If you see a guy start a streak, you need to look for a difference in approach. If you can positively say that on date ‘A’, he changed his stance or started wearing contacts, and has improved since that date; it maybe a valid indicator of future performance. If the player is consistently trying to pull the ball, or working the count more, or doing something that you can quantify differently; it might also be a predictor of the future.

    The problem is when people say player ‘X’ has hit well since date ‘A’. What happened near date ‘A’ to make the difference? More than likely nothing happened, and it’s just a random streak.

  132. joser on August 20th, 2007 12:47 pm

    Sometimes I honestly wish I could see Dave’s and/or DMZ’s facial expressions as they type, when I read exchanges like in 47/48.

    Actually, as soon as I finished the post I thought “Now there’s going to be somebody posting something that completely misses the entire point, including Dave’s plea at the end.” The only question was whether Dave would respond before someone else did, and how.

  133. kentroyals5 on August 20th, 2007 12:47 pm

    Great post Dave.

    I was wondering if there is some equivalence for pitchers going on a ‘hot streak’. A couple that come to mind would be Weaver’s resurgence, or Webb’s ridiculous scoreless streak.

    I understand the whole ‘regression to the mean’ with Weaver, but there are hardly skill-sets, even in the NL, that suggest nearly 50 scoreless innings in a row is nothing but a hot-streak. So, what can you tell me about this argument in the realm of pitchers?

  134. arbeck on August 20th, 2007 12:53 pm

    kentroyals5,

    It’s nothing but a hot streak. There are hundreds of thousands of 50 inning samples for pitchers. While it is unlikely that a 50 inning scoreless streak could happen, if you have enough samples even the unlikely becomes likely.

    The idea you should take from this is that just because Webb has a 50 inning scoreless streak, or Johan Santana struck out 17 last time out; we should base or expectations for their next outings on their past outings. You want to predict their future by using their xFIP for the last 3 years, with a weight towards the more recent performances.

  135. Robo Ape on August 20th, 2007 12:55 pm

    122 and 123: Okay, what I should have said was, when I talk about the existence of hot streaks, I am talking about significant differences from typical performance, not normal, random variation that creates (falsely) what looks like streaks. Yes yes, of course I’m familiar with the coin flip analogy, and the massive and falsely perceived streaks it can generate, but let’s take a case study that we’re all familiar with: Ichiro’s early season performance. Looking at his splits, he’s hitting something like .285 for the month of April over his last four seasons (inclusive of this year). Would you guys argue that, in general, his performance over the first 15 days of April is not predictive for the second 15 days of april?

    Regardless, the only point I really wanted to make was that I can tell you, certainly, that cognitive psychologists have intentionally stayed away from analyzing baseball as a hot-hand phenomenon fallacy expressly because of the difficulty in discerning when a streak is an actual significant deviation rather than just the normal sorts of variations one sees in random sequences.

  136. joser on August 20th, 2007 12:56 pm

    The problem is when people say player ‘X’ has hit well since date ‘A’. What happened near date ‘A’ to make the difference? More than likely nothing happened, and it’s just a random streak.

    Yeah, and that’s why it would be great to be able to see tape, talk to the player, or otherwise have access to information beyond the stats. To take this away from hitters for a moment, in last week’s great Contreras controversy, I was theorizing that something had happened to him prior to June 24: in his appearances before that start he gave up 31% flyballs; in his appearances after that he gave up 46% flyballs. It could be statistical variation, but it’s a big enough swing over enough IP to suggest something about him has changed. And your whole position on the “trade for him or not” question would surely turn on whether something has and whether you think it’s something that can be changed back. But that isn’t something you can determine just from the numbers.

  137. zugzwang on August 20th, 2007 12:57 pm

    DMZ at 116 –

    I think Ralph Malph’s question is, when do longer “streaks” no longer count as “streaks,” but as a sufficient amount of data to have significant predictive value? The only difference between a 20 AB stretch and a 3 years stretch of data is the sample size. As you move from 20 ABs to 1500, you get increasing predictive value. So, at some point an extended hot “streak” can’t be dismissed just as a “streak.”

    Also, thanks to zzyzx for articulating some of the thoughts I’ve been muddling over while considering the predictive value of Sexson’s and Ibanez’s horrid first half of 2007 vs. their longer-term production profile. Four months of sucking air has strong predictive value, but it has to be looked at against an even larger sample set to see if it makes sense. These guys are both in the parabolic decline portion of their careers, but its hard to know how steep the decline will be.

  138. Sec 108 on August 20th, 2007 1:02 pm

    Dave – I just now had a chance to read your post on my lunch break. Great stuff! You consistently make me appreciate the fact that I paid attention in Statistics class in college. I very much am looking forward to your next post.

  139. sparky on August 20th, 2007 1:05 pm

    131. I guess that I’m wondering if there is any way use numbers to identify when the change occurred. The social scientist in me would suggest a content analysis of media reports that mention these types of changes (e.g., Lexis search for batting stance). However, these stories tend to only appear after the fact (as justification for the change). There are few articles indicating that a guy changed his glasses but kept on sucking.

    As such, the next best way would be to find the underlying statistics that reflect this change. Usually, we pick arbitrary points (player is hitting 400 since the allstar break, player hit 15 homers this month). I’m wondering if data (like LD%, longer-term trends in BABIP or isolated power) can be used to identify these changes (and, more interesting for this conversation, the point at which a change occurred…assuming there is one). However, this would require some non-arbitrary definition of when that change occurred.

  140. gwangung on August 20th, 2007 1:13 pm

    Also, thanks to zzyzx for articulating some of the thoughts I’ve been muddling over while considering the predictive value of Sexson’s and Ibanez’s horrid first half of 2007 vs. their longer-term production profile. Four months of sucking air has strong predictive value, but it has to be looked at against an even larger sample set to see if it makes sense. These guys are both in the parabolic decline portion of their careers, but its hard to know how steep the decline will be.

    This brings to mind the old TRADITIONAL baseball saying, “It’s better to trade a player a year too early, than to trade him a year too late.”

    It appears to me that the Ms organization doesn’t believe in that saying (and may not have heard about it). The use of stats and an understanding of streak behavior would seem to me to be a useful diagnostic tool to help organizations determine what year it is.

  141. gwangung on August 20th, 2007 1:15 pm

    As such, the next best way would be to find the underlying statistics that reflect this change. Usually, we pick arbitrary points (player is hitting 400 since the allstar break, player hit 15 homers this month). I’m wondering if data (like LD%, longer-term trends in BABIP or isolated power) can be used to identify these changes (and, more interesting for this conversation, the point at which a change occurred…assuming there is one). However, this would require some non-arbitrary definition of when that change occurred.

    Keeping in mind that these are signs and not explanations of results, wouldn’t it work to run different parameters, find the clearest break and go back to eyeball the behavior to see what, if anything, is different?

  142. Alex on August 20th, 2007 1:19 pm

    Dave,
    Your statistical analysis is flawless. Every GM should use it when constructing a team.

    But, a field manager cannot simply apply statistical principles to managing the team. Why? He must keep players happy by appearing to be fair and by favoring veterans who have “earned” playing time. If you don’t, you’re asking for open rebellion and chaos. These players are overpaid, over-confident, and over-pampered juveniles who will do anything to get you out of the way if you interfere with their God-given right to play. They will also do whatever it takes to remind rookies that they don’t deserve anything and have to “pay their dues.” This psychological profile is a reality that must be dealt with. I don’t like it but it’s McLaren’s reality.

    A GM, on the other hand, MUST use the statistical model to construct the ballclub.

  143. SequimRealEstate on August 20th, 2007 1:25 pm

    [deleted, see comment guidelines w/r/t copyrighted material]

  144. HamNasty on August 20th, 2007 1:25 pm

    This is half way on topic, it goes to a post I have seen a couple times a week and earlier. In regards to “contract/walk” years.

    Off of the top of my head I can think of extremes and also median to the theory. Beltre HOT, Andrew Jones AWFUL, Meche close to median as I can think off the top of my head. The only difference in Meche is walking 1.4 less per 9 and GB rate up 5.4%, his HR rate up 1.5% and K’s are even down from last year. Those deviation rates are nothing crazy to assume he is now a “different” pitcher in my eyes.

    In summary, I would assume that a hot contract year lies in the same catagory as a hot streak, unsustainable and unpredictable.

  145. gwangung on August 20th, 2007 1:26 pm

    re 142

    I’d only amend that saying a field manager cannot blindly apply statistical principles. And there’s nothing to say that he can’t coax players into following statistical principles by the adroit use of psychology.

    But in general, you’re right. Clear principles and rules tend to work best in managing anyone—and rewarding good behavior is still a good idea…

  146. gwangung on August 20th, 2007 1:29 pm

    re 143

    Itd be nice to be able to support that with changes in batting stances over the season….

    Could be true, or it could be after the fact rationalization…

  147. Ralph Malph on August 20th, 2007 1:29 pm

    If Pentland and McLaren knew Ibanez’ was hurt and then his swing was all screwed up, and the ball was like a marshmallow coming off his bat, and they needed to work with him on his mechanics in the cage before he could drive the ball, why was he starting all through July and stinking it up?

    Don’t blame Raul. But absolutely blame Mclaren for intentionally playing a guy who he knew was incapable of performing well.

  148. Rick L on August 20th, 2007 1:44 pm

    I guess a simple way to state this is that a player who is hot will eventually cool down and a player in a slump, will eventually warm up. This latter argument is probably why Grover and MacLaren keep marching Richie Sexson out there. Without knowing it, Mac is agreeing with you that a hot or cold streak has little predictive value for tonights game.

    The thing about the study you cite is that the streaks in question are rather short, whereas Raul’s is nearly three weeks (1.333 OPS for August), Vidro’s is a month and a half (.967 OPS since the all-star break), and Sexson’s is interminable. At what point do we credit a player who is playing better or worse than his past few months with having turned a corner?

  149. Ralph Malph on August 20th, 2007 1:48 pm

    148 — That is the question I asked in 112, which no one has really answered. When you say turned a corner, that corner could go in either direction of course.

  150. HamNasty on August 20th, 2007 1:53 pm

    Was there any study in The Book or anywhere that shows a “streaky” hitter exists? Or even constant hitter?

    I have heard that title applied to Brad Hawpe a lot as a streaky hitter. I also realize if you look at certain sets of data everyone turns into streaky or constant. But my question is do certain players hit high and low marks more often then others? If so, is there a common factor among those hitters?

  151. Dave on August 20th, 2007 1:55 pm

    The concept of a longer streak, changing skillsets, and how to tell when performances matter will be covered in the next post, probably tomorrow.

    But my question is do certain players hit high and low marks more often then others?

    Yep – contact singles hitters (think Ichiro) have significantly more variability in their performances than patient power hitters, simply due to the nature of their hits. The margin between a single and an out is a lot less than the difference between an extra base hit and an out.

  152. Bernoulli on August 20th, 2007 1:56 pm

    It would help if we knew why they were turning the corner, wouldn’t it? Other than that mystical, unpredictable (and therefore, to me, unreliable) “seeing the ball well”.

    Mr. Baker in his latest blog entry refers to Sexson’s numbers in September exceeding his career norms. Why do they do this? What makes us think they’re going to happen again?

    Meanwhile, playing through an injury is noble and all. But if doing so takes a variable we can understand and control and renders it into guesswork, isn’t that bad for the team? If Ibanez wants to make his injuries appear like age-related decline, I say let the team treat it like an age-related decline. The fact that he did this only increases the chance that he’s willing to do it again next week. I don’t know about you, but I don’t like to have to take that chance.

  153. HamNasty on August 20th, 2007 2:02 pm

    Dave 151- Thanks. Brad Hawpe is by no means a single hitter, but his RHP/LHP splits are rather large. Hawpe’s splits below.
    .635 OPS LHP
    .889 OPS RHP

    I have noticed this with Morneau also, who had large split numbers. Would that effect a streak also?

    Does it only pertain to singles hitters would me my question or is there any other factors?

  154. arbeck on August 20th, 2007 2:08 pm

    LH/RH splits probably aren’t governed by streaks the way you are thinking. Quite a few LH hitters have pretty severe platoon splits. Because of this they very often don’t play against LHP. Therefore the sample of at bats is not large enough to be meaningful. You really have to go back to prior years to calculate it.

  155. HamNasty on August 20th, 2007 2:16 pm

    154- Those are Hawpe’s career numbers. I know the playing time situation falls into it. But also if they play less games against LHP that would make it more possible to maintain a constant average, correct?
    They could also get forced to face 4 lefties in 5 games and have to play causing them a cold streak of sorts.
    I was just wondering if splits effect streaks because it seems hitters with bigger splits are streakier hitters. Has there been any research approving or disapproving what I think?

  156. arbeck on August 20th, 2007 2:22 pm

    HamNasty,

    If you have a player with large platoon splits, and you let him start all four days when facing a left hander 4 out of 5 days, you really should just be shot. There is no reason to play the player in that situation.

  157. PullmansFinest on August 20th, 2007 2:26 pm

    What if the recent call up of Adam Jones is the reason for the recent successes of Ibanez and Vidro? What if they both feel the pressure and unlike some people *cough* Richie Sexson *cough* they are answering? I think that Adam Jones has helped this team a lot more than anyone realizes. Just his presence alone has turned the season around for two players. Now, if we can only find a way to call up every first baseman in the organization…

  158. arbeck on August 20th, 2007 2:27 pm

    PullmansFinest,

    So what you are saying is that Vidro and Ibanez both were not trying as hard as they could before Jones was called up? If that’s the case they deserve to be benched.

  159. HamNasty on August 20th, 2007 2:32 pm

    arbeck,

    Agreed, he should not be played. Sometimes you can’t help it if you do not have a suitable replacement or their defense is worth a struggle at the plate. Or you think you can get that left handed starter off the mound in 4 innings, Hi HoRam!

  160. PullmansFinest on August 20th, 2007 2:36 pm

    158-or they found a new gear and are just playing out of their minds right now. It’s the managers job, in my eyes, to find ways to motivate his players. McLaren found out how to motivate Ibanez and Vidro.

  161. jimbob on August 20th, 2007 2:42 pm

    I apologize if this has been pointed out in an earlier post: [deleted, off topic]

  162. Jeff Nye on August 20th, 2007 2:44 pm

    “Found a new gear”, or some mystical magical motivational skill on McLaren’s part, is really more realistic of an option to you than just random variation?

  163. JLC on August 20th, 2007 2:53 pm

    I could buy into the “made him work harder” argument if we were talking about players on middle to last place teams who were used to being stars and had no other reason to put out a full effort. I think those players exist.

    But I can’t imagine that would apply to players on a team that was unexpectedly in a playoff race.

    I also agree that with Raul in particular, to let him keep going to the plate when the staff knew he was hurt was inexcusable. I don’t particularly like “gamers” who play through injuries, causing more problems than if they’d just sit out for a while and let themselves heal and be more productive. I know there are players (all catchers, for example) who play hurt to some extent all the time. At the level of professional athletics, I expect competitive athletes who don’t want to take time off.

    That’s what the coaching staff is for, to make those kinds of decisions that the players are too involved with to make for themselves.

  164. gwangung on August 20th, 2007 2:54 pm

    I apologize if this has been pointed out in an earlier post: Doesn’t it seem the Mariners are hitting much better against lousier teams?

    Generally, I think that happens when good teams face bad teams.

  165. PullmansFinest on August 20th, 2007 3:00 pm

    The point is Ibanez and Vidro are playing a lot better than they should be. At this point McLaren would add another stupid decision to his long, long resume if he didnt play them. If you were managing the M’s you’re telling me you would bench them? Mystical magical motivational skill? Gimme a break…

  166. PullmansFinest on August 20th, 2007 3:00 pm

    The point is Ibanez and Vidro are playing a lot better than they should be. At this point McLaren would add another stupid decision to his long, long resume if he didnt play them. If you were managing the M’s you’re telling me you would bench them? Mystical magical motivational skill? Gimme a break…

  167. JLC on August 20th, 2007 3:01 pm

    As far as small sizes predicting future performance, the Chicago announcers talking about Raul take the cake.

    “He just kills us. Against any type of pitcher, at any park. It’s just anybody wearing the [Chicago] uniform.”

    I’d never before heard fashion connected to batting.

  168. Ralph Malph on August 20th, 2007 3:06 pm

    Sure, you play Vidro and Ibanez. The question isn’t whether you play them. It’s when and where you play them — and how much.

    Behind fly ball pitchers (like Washburn), Ibanez should not be in LF, Jones should.

    Behind ground ball pitchers (like Felix), Vidro should not be at 2B, Lopez should.

    I would be mixing and matching Jones into the lineup by DH’ing Ibanez a lot and playing Vidro at 2B a little, and sitting Sexson against at least some RHP. But that would take some thought and planning.

  169. Rumpelstiltskin on August 20th, 2007 3:17 pm

    I haven’t had time to read all the comments, so I apologize if this has already been mentioned, but I strongly believe that the .004 difference is due to injuries. Players who have had a hot streak are less likely to be injured or somewhat injured compared to the average MLB player. And the reverse would also be true-guys who are on cold streaks are often mildly injured resulting in a poorer performance. Remember earlier in the year when Raul Ibanez said he couldn’t pick up a bat with his left hand without it shaking? Um, I would’ve rather not had that guy in the lineup every day. Of course grit is valued more than winning in Mariner Nation…

  170. Jeff Nye on August 20th, 2007 3:28 pm

    I’m not going to go into what I would or wouldn’t do if I were managing the M’s, since the authors have asked us repeatedly not to turn this into a rehash of those discussions.

    Suffice it to say I would not Xerox the same lineup card night after night like McLaren does. Perhaps he gets commission from the people who sell toner to the Mariners.

    In any case, until I have real information on what is causing Ibanez and Vidro to “play better”, random variation makes much more sense to me than what basically amounts to voodoo.

  171. msb on August 20th, 2007 3:33 pm

    #158, 160 — FWIW, Norm Charlton opined yesterday that there isn’t ‘another gear’ — he noted that Ibanez was starting to pull out of his slump before Jones came up, and felt that it had nothing to do with being pushed.

  172. rsrobinson on August 20th, 2007 3:39 pm

    Will you just never allow random variation to enter your mind as a cause for anything?

    Over a relatively limited period I might. Since the study involves just five game periods, I’d agree that almost anything can happen on a baseball field over five games. The longer a player’s hot hitting continues, though, the less likely it would be due to random variation and the more likely it would be there are other factors involved, including the possibility that the player is just performing at a higher level for whatever reason.

    If, for example, a player continues to hit well above his expected wOBA for 30 games, then couldn’t it be said that his performance in those first five games WERE an indicator of future performance and that there may be other factors involved than just random variance?

  173. gwangung on August 20th, 2007 3:45 pm

    If, for example, a player continues to hit well above his expected wOBA for 30 games, then couldn’t it be said that his performance in those first five games WERE an indicator of future performance and that there may be other factors involved than just random variance?

    Well, then…take it a step further…take a look at players batting over month (a little less than 30 games)? Do they exhibit marked fluctuations? If so, that argues against it….

  174. HamNasty on August 20th, 2007 3:48 pm

    Standard lineup as expected against Garza.

  175. VaughnStreet on August 20th, 2007 3:52 pm

    171– I heard that too. Charlton was talking out both sides of his ass. At first he said baseball players play at their max 100 percent of the time, then said they can’t play at max 100 percent of the time. In other words, they play hard all the time, except when they don’t. And competition among teammates is good, but it doesn’t make players play any better. Which is it Norm?

  176. Roger on August 20th, 2007 3:53 pm

    Thanks, Dave, for a great article.

    Your comment box needs a timer on it. If someone uses it to reply in a time less than it would take a normal person to read all of the comments, that comment should be thrown out. Or just search for and destroy comments that have “haven’t had time to” in them.

    Regarding Adam Jones pressuring other players to play better, these are pro athletes we’re talking about. These guys, with very very few exceptions, don’t know how to play at anything other than an ultra-competitive level. I don’t buy the “finding an extra gear” argument at all.

  177. HamNasty on August 20th, 2007 3:54 pm

    Everything fluctuates. Ichiro is a career .331 hitter does that mean every night he goes 1 for 3 or 2 for 6? No. It is called average for a reason.

  178. Jeff Nye on August 20th, 2007 4:04 pm

    Since you refuse to accept that the hypothetical 30 game sample is due to random variation, it’s incumbent on you to provide us with a reasonable counter explanation that you can back up with actual facts.

    I humbly submit that something be added to the comment guidelines, along the lines of “the more extreme the hypothesis you present, the heavier the burden of proof is on you to provide factual support for that hypothesis”.

    It’s getting more and more tiresome to see “it’s not random” with no attempt to say what it IS if it’s not random, at least not an attempt that doesn’t include “anyone who watches the games should see” or “it should be obvious to anyone that”.

    Dave provided us with about two and a half screens worth of data (on my monitor, anyway) to support his assertion; you do him, and all of us, a disservice by saying “you’re wrong” and providing three lines with absolutely no factual information as your “support”.

  179. wallywwu on August 20th, 2007 4:07 pm

    Fantastic post, this stuff is why everyone says this is the best blog around,

    I also apologize if this has been talked about already, but I had one thing I noticed that was a little bit curious to me. It was mentioned in the statistics book that after each player had gone on their super hot streak that the authors had used an historical 3 year average to determine their expected numbers after the hot streak. This seems to me as using the ‘track record’ of a player to determine how they will do. I’ve been reading for weeks how the track record is a terrible way to evaluate future success.

    Now, I’m just sort of playing devils advocate here, but using the authors situation, this would make Ibanez’s future expectations look pretty decent since you would be factoring in his past seasons which were obviously much better than early this season.

    Really enjoyed the post, that just sort of stuck out to me.

  180. Jeff Nye on August 20th, 2007 4:13 pm

    wallywwu, to address what I think you’re asking about even though I’m not one of the authors:

    The experiment is a pretty large sample-size one, so it can’t cover everything, and situations like the one you point out surely exist, that’s why it’s important to HAVE a large sample size.

    So there’s going to be outliers like that, where the 3 year average is maybe a little more optimistic than it should be because it can’t take into account things like injuries, sudden disappearance of any baseball skill whatsoever, etc.

    I think in the individual case you’re talking about, it’s probably half over-optimistic, and half that the doom and gloom about the player in question were a bit overstated.

    At the time, it looked like it may very well be one of those rare circumstances where someone just loses all skills they ever possessed.

    But in the main, the data set is solid because the sample size is large enough to smooth out those outliers.

    I hope that answers what you were asking?

  181. rsrobinson on August 20th, 2007 4:42 pm

    Jeff Nye Says: In any case, until I have real information on what is causing Ibanez and Vidro to “play better”, random variation makes much more sense to me than what basically amounts to voodoo.

    What “real” information are you expecting? Ball players don’t come with video game health, energy, or mana bars to show what level they should be performing at. I don’t know all the underlying factors behind any athlete’s performance and neither does anyone else, including the athlete himself.

    And I was trying to keep Vidro out of this but if you don’t like my hypothetical 30 game sample then how about Vidro’s last 36 games, dating back to July 27, when he began producing much higher numbers than previously. That’s 31 games beyond when Dave’s study would have predicted he would come back down to what was previously expected of him. At some point doesn’t it begin to stretch credulity that it’s just simple random variation and instead represents a real and measurable improvement in performance?

  182. Ralph Malph on August 20th, 2007 4:58 pm

    31 games? Haven’t you ever seen a rookie come up in September and have a huge month and then never hit again? Does that mean he forgot how to play after that month?

    Remember Shane Spencer? 373/411/920 in 27 games as a rookie. Lifetime 262/326/428 hitter. Did he have a “real and measurable decline in performance” after that, or did he just have a really good month. Why didn’t that month predict his performance thereafter?

  183. John09 on August 20th, 2007 5:00 pm

    Dave,
    A point that can be made though, is that this is a team sport. And such, while a players(especially a hitter vs. a pitcher) hot streak will cool off, he’s provided a tremendous benefit to the team while he’s on it. That is the reason they talk about a player like Sexson “carrying” a team. The team knows he’ll not hit that way all season, but he can win games by himself at times when he’s hot.
    Yes, a player will cool off and regress to the mean, but the benefit of several home runs or an RBI spree will offset his cooling period. And the reason for that is that it’s a team game. Other players will get hot and compensate for other players cold spells.
    The numbers you’re looking at is only in the context of a baseball card, and not in the context of a team.

  184. Ralph Malph on August 20th, 2007 5:05 pm

    The context is prediction. How do you predict who should play tonight? Based on who’s “hot”, or on who’s better.

    A player like Sexson might (once upon a time) have carried a team for a month. Does that mean he’s likely to be “hot” the next day?

    That is the point of the study.

  185. Ace on August 20th, 2007 5:05 pm

    167,
    Ibanez 2004 – 2006 stats:
    .290/.354/.475
    against the Sox:
    .293/.350/.478

    Decent numbers but he’s not killing them, this year he is a little better against the Sox: .452/.500/.806.

    But wow, he kills the Angels, last three years (230AB) .357/.411/.526.

  186. gwangung on August 20th, 2007 5:06 pm

    What “real” information are you expecting? Ball players don’t come with video game health, energy, or mana bars to show what level they should be performing at. I don’t know all the underlying factors behind any athlete’s performance and neither does anyone else, including the athlete himself.

    That’s the point of the large samples, to try and catch all of those.

    At some point doesn’t it begin to stretch credulity that it’s just simple random variation and instead represents a real and measurable improvement in performance?

    Yes…BUT NOT AT THESE NUMBERS.

  187. pinball1973 on August 20th, 2007 5:08 pm

    There is a reason to play a player immersed in a “hot streaK” (however the particular booster of that player might define the term – see “clutch hitter”). It is very much like the reason we all used when watching the stage version of “Peter Pan” and clapped enthusiastically to prove we DID “believe in fairies” to revive Tinkerbelle.
    Not to revile the fans who beleive in such things, since superstition as a zest is an entertaining thing if there are no real costs for failure, but no manager has any such excuse for this kind of nonsense when in the middle of a hot pennant race.

  188. lailaihei on August 20th, 2007 5:10 pm

    Based on past performance, I predict a new game thread pretty soon.

  189. John09 on August 20th, 2007 5:18 pm

    Ralph,
    If the context is prediction, then I would say, defintely play the better player. But to determine who the better player is not so easy in the context of this discussion.
    Is Sexson a bad player who occasionally gets hot? This year isn’t a good one for him, but is that his history? If so, then how does that fit into the discussion?
    Do you play someone who’s more consistent? Is that better than streaky? Is someone who contributes a little “above average” to the team consistently the whole year (pick your example player) better than a guy who underperforms for a few weeks and then gets hot for a few weeks and carries the team?
    I think that big factors in helping determine the answer are things like defense, base running, etc…

  190. DMZ on August 20th, 2007 5:41 pm

    To return to what Dave said, if I might be so bold, to answer the question “when is it random variance and when is it real?”

    There is not, as Mr. Robinson mockingly suggests, some video game-style indicator (ha ha! original insult there!). However, as Dave suggests, you instead look for other skills. If a player who has never hit for a high average begins to hit extremely well, you look for other indicators: is there a pitch he’s learned to lay off? Is he hitting more line drives? Was there an injury (as Dave notes, in comments above) that might have hurt his hitting?

    At which point, you test that theory– have other players in similar circumstances seen the same effects?

    For instance, look at wrist injuries, which seem to have particularly harmful effects on power. If a player who used to hit for power, broke their wrist and had a season where they couldn’t drive the ball came back in the next year to return to their power-hitting ways, there’s a skill-based explanation there.

  191. DMZ on August 20th, 2007 5:43 pm

    w/r/t players getting hot/cold in the team context:
    - again, there’s no evidence that hotness or coldness is real
    - it’s certainly not contagious
    - and does not carry any hot-streak-immunization properties
    - putting any team but the best one on the field is like intentionally giving your opponent the advantage

  192. PullmansFinest on August 20th, 2007 5:53 pm

    Things EFFECT random variation. What Ibanez is doing is hardly random variation.

  193. DMZ on August 20th, 2007 5:58 pm

    They’re not teaching the difference between affect/effect at Pullman? Really?

    I don’t know what kind of a case I could possibly lay out that would convince you to change your mind at this point, so I’m happy to concede that I am better off not continuing to try. Thanks.

  194. PullmansFinest on August 20th, 2007 6:16 pm

    Effect: Any result of another action or circumstance.
    Affect: Generally used to suggest emotion.

    DMZ,
    I’m not arguing with you or Dave. I couldn’t agree more. Actually, that’s a lie. I could. I get the article. If I felt in July that Ibanez should be benched for Jones, I should still feel like that because random variation will cause him to come back down into the stratosphere (the second layer of earth’s atmosphere, thanks Wazzu!). But, why are we even talking about Ibanez when there is a better player than him sitting the bench behind a player that’s been trash all year. Ibanez’s random variation shouldn’t be an issue right now. Sorry.

  195. PullmansFinest on August 20th, 2007 6:19 pm

    By the way, I was refering to Broussard when I said a better player and refering to Sexson when I said a player that has been trash all year.

  196. rsrobinson on August 20th, 2007 6:44 pm

    w/r/t players getting hot/cold in the team context:
    - again, there’s no evidence that hotness or coldness is real
    - it’s certainly not contagious
    - and does not carry any hot-streak-immunization properties
    - putting any team but the best one on the field is like intentionally giving your opponent the advantage

    You may be absolutely right. On the other hand, no manager in baseball would take Raul’s bat out of the lineup after watching him hit bombs for the past two weeks no matter how many statistical studies you quote them. Do you blame them?

  197. Jeff Nye on August 20th, 2007 6:48 pm

    Cool, we hadn’t had enough of the “appeal to authority” fallacy in this thread yet.

    I think we’ve hit our quota now though.

  198. terry on August 20th, 2007 7:08 pm

    And I was trying to keep Vidro out of this but if you don’t like my hypothetical 30 game sample then how about Vidro’s last 36 games, dating back to July 27, when he began producing much higher numbers than previously. That’s 31 games beyond when Dave’s study would have predicted he would come back down to what was previously expected of him. At some point doesn’t it begin to stretch credulity that it’s just simple random variation and instead represents a real and measurable improvement in performance?

    To be a “REAL” improvement in a skill, the improvement should be repeatable. To suggest that 69 PA’s in an August split informs a true change in ability stretches credulity because the suggestion ignores the well documented and dramatic variation that exists in the performance results of major leaguers. Such a suggestion especially stretches credulity when the implication is that a 33 yo with bad legs and three successive years of declining production has suddenly acquired a new, tangible, repeatable ability to put up this line: .415/.478/.509.

    Dave has recently argued that creating a lineup based upon estimating a player’s true skill set as precisely as possible is more accurate over time than relying upon results-based analysis. For those that would eschew this approach, I ask these questions:

    If the fact that Vidro and Ibanez have tore it up for 70 PA in August justifies maximizing their playing time in the future, how many PAs would it take to decide they aren’t hot hands any longer? How does the near major league worst defense of a guy like Ibanez fit into this “play the hot hand” philosophy? PBP metrics like UZR suggest that Ibanez is a -29 run defender over the course of ’07. He still needs a VORP of 14 over the remaining 40+ games just for his bat to roughly equal the value his glove has given up.

    It seems to me that not only does using streaks to project performance represent a very flawed approach from an accuracy standpoint, it also suffers by ignoring context.

  199. Trev on August 20th, 2007 7:29 pm

    If you’re wondering what the wOBA formula is:

    (.72*BB + .75*HBP + .9*1B + 1.24*2B + 1.56*3B + 1.95*HR) / (AB+BB+HBP)

    Some wOBA for the M’s:

    .358 Ichiro!
    .357 Jose Guillen
    .349 Jose Vidro
    .345 Adrian Beltre
    .344 Ben Broussard
    .340 Raul Ibanez
    .325 Kenji Johjima
    .319 Yuniesky Betancourt
    .308 Richie Sexson
    .290 Jose Lopez

  200. PullmansFinest on August 20th, 2007 7:30 pm

    197-As opposed to all the Ad Hominem fallacies that are committed on a much regular basis on here?

  201. rsrobinson on August 20th, 2007 8:50 pm

    If a player who has never hit for a high average begins to hit extremely well, you look for other indicators: is there a pitch he’s learned to lay off? Is he hitting more line drives? Was there an injury (as Dave notes, in comments above) that might have hurt his hitting?

    But in the case of Ibanez and Vidro, since those are two guys being talked about here, you have guys with proven major league track records as quality hitters. I’ve read a lot of arguments here (not necessarily by you or Dave) about how their subpar first half performances were evidence that both were in permanent decline (therefore Adam Jones should be played every day) and now many of those same people are trying to explain away their improved second half performances as statistical anomalies (therefore Adam Jones should be played every day). The willingness to believe that the stretches of bad performances are a truer indicator of current ability than the stretches of good performances appears to me to be directly proportional to how badly they want Adam Jones in the lineup.

    So isn’t it possible that the assumption that both were washed up was premature, especially since both were previously plagued by injuries (for several years in Vidro’s case) but both now appear to be relatively healthy?

  202. flippy on August 20th, 2007 9:28 pm

    For what it’s worth:

    Geoff Baker mentioned in one of his posts that Ibanez made a change in his mechanics. He has since hit the ball better.

    I am not making a case for one way or another, this is just a fact.

  203. flippy on August 20th, 2007 9:28 pm

    I flipped a coin 100 times… heads was on a hot streak.

  204. flippy on August 20th, 2007 9:30 pm

    If Adam Jones’ presence has increased the play of Raul and Vidro, do you think we could get a Contreras look-a-like and sit him in sight of HoRam?

  205. heyoka on August 20th, 2007 9:58 pm

    Player A is a career 109 OPS, the previous 3 average 120 (variance of 9), and this year is at 112 (hot and cold streaks included).
    Player B has had a great recent minor league performance, and no significant major league experience.

    Let’s get to what this post is really getting at. Dismiss hot streaks, that’s fine, but it still doesn’t necessarily make Adam Jones an everyday offensive upgrade over Raul Ibanez, as Geoff Baker and all the rest of us who were stuck on his extended cold streak believed. (I still like this sites contention that RI shouldn’t be in the line up against LHP, or at least not in the 4-hole.)

    In 1990, George Brett looked like he was in full decline the first half of the season. The second half of the season was on fire and stayed on fire to take the batting title. He reverted to previous form.
    1991 though he sucked…..the end of this season may be a good time to trade aging Raul to make room for Adam Jones. It’s inevitable. And a worthwhile risk.

    In conclusion, yeah, we shouldn’t look at the hot streak and say, ‘oh, he’s gonna stay hot’ but at the same time reverting to previous performance level is also part of what the study suggests and 2007 is a smaller less predictive sample than 2004-2006.

  206. Jeff Nye on August 20th, 2007 10:04 pm

    Is Jones in LF every day an upgrade over Ibanez when you consider just offense? Arguable.

    Is jones in LF every day an upgrade over Ibanez when you factor in defense as well? Not arguable.

    I’m not sure why this is so hard to grasp. :(

  207. Jigga on August 20th, 2007 10:18 pm

    I’d like to see an analysis on whether players performed better during hot streaks than during cold streaks. Oh, wait, that is self evident. I’m not saying Ibanez is better today than he was July 31 b/c he is hot today, but if you use a reliable indicator, such as performance over the last three years, you will.

    Dave, you make a good points: no the earth is not spinning on a different axis now that RI is ‘hot.’ But the Ms are scoring more. That’s what matters. Roll with it. Superstition has been an historical part of this crazy game, and that is unlikely to change. Baseball and life are games of moments … play them out.

  208. B_Con on August 20th, 2007 10:29 pm

    Tonight’s anecdotal evidence did not help this thread’s cause. That said, I agree we need Adam Jones in the OF. Sexson is my choice for sitting a man as of now. If Vidro wasn’t worse at second base than your average 7th grader trying to undo a bra-strap I’d bench Lopez and get Vidro in the field.

  209. julian on August 20th, 2007 11:54 pm

    “There’s a debate about whether hot streaks are random fluctuation of events or an actual change in skills for a temporary period of time. I don’t even begin to know the answer to that question, and I can see the validity of both arguments.”

    Actually, I think the statistical argument in The Book (as described in your post) suggests that hot streaks are much more likely due to random fluctuation of events than an actual temporary change in skill. If the “change-in-skill” hypothesis were broadly true, then you would expect to see higher WOBA after five-game “hot streak” periods, unless you believed that all players had better skills for exactly five games but no more.

  210. julian on August 21st, 2007 12:03 am

    203-

    That reminds me of an interesting experiment common to first-year probability theory classes:

    Tell everyone in the class that their homework assignment is to either: a) flip a coin 100 times and record the results, or b) make up the results and write them down.

    The professor collects the sheets, and then proceeds to announce who cheated and who didn’t. Sounds incredible! The key is that the professor looks for “runs” (“hot streaks”, if you will) of six or more heads or tails in a row; in a sequence of 100 flips, this will occur at least once with high probability. Of course, “cheaters” will almost never write down six or seven heads or tails in a row, because they don’t think it’s realistic.

    I’m sure you could do a similar calculation and see that it would be surprising *not* to see a .300 career hitter with several extended periods where he hit above .450 or so.

  211. rsrobinson on August 21st, 2007 12:13 am

    Dave wrote: In July, Geoff was on board with the belief that Adam Jones would be able to help the Mariners as an everyday player, and the struggling veterans should be ceding playing time to the more talented youngster. He felt the struggles of guys like Vidro and Ibanez warrented a change, and Jones provided a superior option. He doesn’t feel that way anymore. Why? Because Raul Ibanez and Jose Vidro are hitting well recently, and Baker believes in the predictive power of the hot hand.

    Or it could be that, with another 30 or so games having been played, there’s a larger 2007 sample that provides a different picture of Vidro’s and Ibanez’s current abilities. Back in July, based on a subpar first half by both players, there was reason to believe that each had possibly reached a point of significant decline in their careers. With the addition of greatly improved second half numbers, the picture is much different now.

    Since you’re using wOBA as a stat to measure players, let’s look at Vidro’s and Ibanez’s 2007 wOBA to date and compare them with their career numbers:

    Vidro (2007 / Career): .353 / .361
    Ibanez (2007 / Career): .351 / .354

    Both players right now are posting 2007 numbers very similar to their career averages with only a minor statistical decline. And since .340 is considered to be about average, both could still be considered above average major league hitters according to their current wOBA numbers. So the fear that Ibanez and Vidro were experiencing significant career declines is no longer borne out by the numbers based on a larger 2007 sample size.

    Also, with both Vidro and Ibanez now within spitting range of their career numbers and still hitting above the major league wOBA average it’s less likely that Adam Jones would be able to match the offensive productivity of either player which would lessen his value as a defensive replacement in LF.

    In other words, the case for Adam Jones playing every day is statistically weaker now than it was in July based on a larger 2007 sample.

  212. milendriel on August 21st, 2007 1:21 am

    194: So you use “effect” as a verb and then give the definition of its noun form to justify your mistake? That doesn’t make a lot of sense.

  213. terry on August 21st, 2007 6:43 am

    But in the case of Ibanez and Vidro, since those are two guys being talked about here, you have guys with proven major league track records as quality hitters. I’ve read a lot of arguments here (not necessarily by you or Dave) about how their subpar first half performances were evidence that both were in permanent decline (therefore Adam Jones should be played every day) and now many of those same people are trying to explain away their improved second half performances as statistical anomalies (therefore Adam Jones should be played every day). The willingness to believe that the stretches of bad performances are a truer indicator of current ability than the stretches of good performances appears to me to be directly proportional to how badly they want Adam Jones in the lineup.

    So isn’t it possible that the assumption that both were washed up was premature, especially since both were previously plagued by injuries (for several years in Vidro’s case) but both now appear to be relatively healthy?

    Here’s a problem that I am having with your argument. Essentially April-July 26th was consistent with Vidro’s prior three years of decline. Then an unsustainable 4 weeks stretching through the latter part of July till now is being used to support the case that Vidro has some new skill or at least has rejuvenated back to his glory years in Montreal. Even after a hyperbolic bump in performance, he’s still a slightly below average DH. Basically, you’re ignoring Vidro’s decline and weighting his glory years when referring to his proven track record while also focusing on roughly 79 PAs in the now….

    This issue is at the heart of projecting performance accurately and it’s really you that seems to be throwing out the most relevant information when forming a conclusion. No one has a crystal ball so this argument isn’t about who’ll be right or wrong in the end (frankly anything is possible)-it’s a debate about methodology and “most likely” and Dave, Derek and others are on the right side of that debate.

  214. Colm on August 21st, 2007 7:11 am

    I spotted that too. PullmansFinest, you’re still wrong. Go look again at your dictionary.

  215. Adam S on August 21st, 2007 7:23 am

    Effect: Any result of another action or circumstance.
    Affect: Generally used to suggest emotion.

    Not that I come to USSM to discuss grammar/usage, but if I can educate one college student a day, in 100 years we’ll have a literate work force. Those definitions are correct for nouns. But you used affect as a verb. As a verb you want Affect 99% of the time; the other 1% we’ll forgive you for being incorrect.

    On the other hand, no manager in baseball would take Raul’s bat out of the lineup after watching him hit bombs for the past two weeks no matter how many statistical studies you quote them. Do you blame them?
    I agree with your assessment of managers. It doesn’t make them right and yes I blame them. Ibanez has no business being a regular in the lineup against a left-handed starter — before, during, or after this hot streak. I don’t have a righty/lefty breakdown but of his 9 HR this month only one was against a lefty. I do understand that politically it’s hard to turn Ibanez into a platoon player when he’s hitting HR like Babe Ruth. I wonder if a manager should be concerned with politics or winning as many games as possible.

    This is just another case where conventional wisdom — riding streaks — is wrong. In fact it’s contradictory. McLaren is both playing Ibanez and Vidro because they’re “hot” AND playing Sexson because he’s cold and he believes Sexson will break out of it.

    Maybe if McLaren keeps reading blogs and has an open mind, he’ll learn something.

  216. rsrobinson on August 21st, 2007 7:27 am

    Here’s a problem that I am having with your argument. Essentially April-July 26th was consistent with Vidro’s prior three years of decline. Then an unsustainable 4 weeks stretching through the latter part of July till now is being used to support the case that Vidro has some new skill or at least has rejuvenated back to his glory years in Montreal. Even after a hyperbolic bump in performance, he’s still a slightly below average DH. Basically, you’re ignoring Vidro’s decline and weighting his glory years when referring to his proven track record while also focusing on roughly 79 PAs in the now….

    Vidro’s prior three years were also injury plagued which should be taken into account. It appears that DH’ing everyday and staying out of the field has allowed him to remain relatively healthy, which was Bavasi’s hope when he traded for him. And now his 2007 wOBA, taking into account all hot and cold streaks over 420 ABs, is only slightly below his career average and is still above the major league average by more than ten points.

  217. Chris Miller on August 21st, 2007 7:27 am

    In other words, the case for Adam Jones playing every day is statistically weaker now than it was in July based on a larger 2007 sample.

    Not when you replace a -20 (or even -10, to be conservative) defensive Left Fielder with a Center Fielder. Adam Jones is still a better player than Vidro or Ibanez.

  218. Chris Miller on August 21st, 2007 7:30 am

    I’m willing to accept Ibanez isn’t done as a hitter (he has no business in the outfield). Vidro’s another story. His .317 BA is a fluke. His BABIP has jumped way above his last 3 years, which reeks of sample size flukiness. I’m still behind the idea, even one entire year, is a small sample, especially for batting average (which is driving his OBP).

  219. Colm on August 21st, 2007 7:50 am

    I’m willing to bet we haven’t changed rsrobinson’s mind

  220. rsrobinson on August 21st, 2007 8:12 am

    I’m willing to bet we haven’t changed rsrobinson’s mind

    Hey, I like Adam Jones. He’s an exciting young talent with a world of potential. He’s obviously the future for the M’s in LF and Ibanez will probably either be dealt during the off-season to make room for him or will have to accept a role as a DH.

    What I’ve been questioning is the USSM conventional wisdom that playing Jones everyday right now provides the M’s their best opportunity to get into the playoffs this year. I don’t see it as the no-brainer that many here seem to believe, and I’m not the only one. A lot of baseball people who’ve been in the game for decades (and not just McLaren) don’t necessarily agree with that either.

    Now people can either condescendingly trash anyone who questions or disagrees with USSM dogma on this issue (and I’ve seen plenty of that) or accept that both sides might have legitimate arguments and that reasonable people can disagree.

  221. Colm on August 21st, 2007 9:05 am

    rsrobinson:
    The blog’s been around an around on this issue. You are probably the most reasonable proponent of the “keep playing Ibanez and Vidro” viewpoint.

    If I’m reading you correctly your take is that Vidro and Ibanez are hitting well, especially during their hot streaks, so Jones should stay on the bench.

    I read the take of Dave and Derek and the other informed people here and see numbers that suggest the following. For the rest of the season it is reasonable to expect:
    Raul to post about .800 OPS (especially if kept away from LHP)
    Jones to post about .720 OPS, but save about five runs in the field vs Ibanez
    Vidro to post about .730 OPS

    Given all of that, the optimal alignment is to have Jones in the field all the time, Ibanez as DH against righties, with Vidro as a decent PH option and the starting DH against lefties.

    Your main argument against this is the you think Vidro will continue to put up superb production from the DH spot, as he has done for the last month.

    Your argument for believing that is that Vidro’s recent prolonged “warm” streak is more significantly predictive of his likely future performance than his performance over the past 3 seasons (weighted to give most significance to 2007).

    But you haven’t advanced a convincing argument for believing THAT.

    You have cited political reasons why management would be reluctant, and the conventional baseball wisdom that hot streaks DO have predictive significance.

    Am I misreading your posts, or do you have another, stronger case to make here?

    You can be reasonable, and we can disagree, but your opinion seems less well informed than that of Derek and Dave. Ergo I am more persuaded by them.

  222. adroit on August 21st, 2007 9:13 am

    Great post– and dang, I guess I need to go get The Book.

    I wish I’d read this sooner in order to perhaps get a response, but I want to question whether the assessment that Jones > Raul on 7/31 wasn’t also influenced by “streaks.” Raul’s up until that point was cold, and Jones’ being hot at AAA.

    Isn’t it all just a matter of sample size? First they increased the window from 5 to 7 days as defining a ‘hot’ streak– in Raul’s case we’re coming up on several weeks now. Keep going and you’d be including the whole of the previous three years. So how do we determine where to strike the balance between “expected performance” (based on 3 year averages) and “streak?”

    Wern’t the people who wanted Jones to replace Ibanez doing so based on Raul’s poor performance in 2007? Do to so said nothing of his 2005 or 2006 performance, which would’ve been considered in determining his expected performance overall.

    That said, it’s an interesting discussion– and one that really challenges us to decide whether we prefer to think with our hearts or think with our heads.

  223. Colm on August 21st, 2007 9:21 am

    No adroit, I think you are misreading the arguments.

    Raul’s ‘streak’ is dragging his 2007 performance back to a level that seems very reasonable. Performance at that level is a good enough reason to keep him in the lineup, but not a good enough reason to keep him in the field.

    Dave’s main, original argument for replacing Raul with Jones in left field was not based on Jones putting up a .980 OPS in Tacoma (or whatever it was) but on his being a huge defensive upgrade.

    Platoon Raul and Vidro at DH and make Jones an everyday player.

    BTW, I haven’t seen anyone else describe two-thirds of a season as a ‘streak’.

  224. Jeff Nye on August 21st, 2007 9:21 am

    Maybe you should question why people so strongly hold their “USSM conventional wisdom” about Adam Jones.

    Maybe it’s because the authors here have taken the time to provide us with carefully researched information supporting their argument, while you consistantly “refute” it with things like “it should be obvious to anyone who watches the games that Vidro is hitting the ball harder” and “a lot of baseball people who have been in the game for decades”.

    Aside from your one attempt to delve into wOBA, which you’ve latched onto to support your arguments while ignoring things like the unsustainably high BABIP that Chris Miller mentions, you’ve given us no reason to support your argument.

    And frankly, I find the continuing implications that anyone who disagrees with you is participating in some sort of mindless groupthink incredibly insulting and tiresome and I’m increasingly wishing there was a way to hide comments from specified users here.

    I agree with Dave and Derek (in the main) because they’ve given me well-thought-out reasons to do so. You haven’t. Stop implying otherwise, or go away.

  225. skyking162 on August 21st, 2007 9:27 am

    julian/210 — I think you’ve touched on something important here. “Streak” can have different meanings.

    One, it’s just a grouping of outcomes that appear abnormally similar (6 heads in a row or batting .500 over ten games).

    Two, a large set of data could be statistically significantly streaky, containing more than its fair share of groupings. For example, if you simulated a coin toss where each flip favored the previous outcome 60/40 (instead of normally 50/50), you’re data would contain “too many” streaks. In Julian’s college example, student-created streaks would contain too few streaks to be created randomly.

    So, statisticians, is there a test you can perform on a sample of outcomes to judge how random it actually is — some sort of binomial test based on streaks?

    (Of course, such a test wouldn’t tell you if any particular streak was significant or not, it would just tell you whether or not you should bother looking for meaning in streaks in the first place.)

  226. rsrobinson on August 21st, 2007 9:30 am

    I read the take of Dave and Derek and the other informed people here and see numbers that suggest the following. For the rest of the season it is reasonable to expect:
    Raul to post about .800 OPS (especially if kept away from LHP)
    Jones to post about .720 OPS, but save about five runs in the field vs Ibanez
    Vidro to post about .730 OPS

    The main problem I have with this is that the prediction of Jones’ OPS is basically just a wild ass guess based on his AAA numbers and an estimation of what his current ability to hit major league pitching might be.

    There are some rookies who have come out of the gate hitting well but there have also been a whole lot of talented rookies who have struggled for awhile to adjust to major league pitching. Alex Gordon of the Royals, the second overall pick in the 2005 draft, is a current example. He’s been steadily improving at the plate as the season has gone on but he spent the first couple of months looking overmatched by major league pitching. The Mariners don’t have the luxury right now with Jones that the Royals had with Gordon of allowing him a couple of months to adjust if he struggles at first.

  227. Colm on August 21st, 2007 9:56 am

    But a projection based on AAA numbers is not “just a wild ass guess” it’s a projection based on what hundreds of other major league players hit upon promotion after putting up numbers in AAA.

    You chose Gordon as a reason we can’t expect Jones to hit well. You could also have chosen Hunter Pence or Ryan Braun or some other rookie who has hit well. That’s anecdotal evidence and is a poor predictive tool. You might as well say he’ll hit 350.400.550 for the rest of the year.

    The most reasonable expectation is that Jones will hit say, .800 OPS with a pretty large std dev either direction. Dave picked a conservative number around .720OPS and factored in defense to arrive at his conclusion that having Adam Jones in left field gave the M’s an overall chance of winning.

  228. skyking162 on August 21st, 2007 10:01 am

    226 – But the expectation for Jones is as valid as that of Ibanez and Vidro, just with higher variability. Sure, he might crap out, but that probability is countered on the high end such that the average is a reasonable midpoint.

    Now, variance might play into the decision, but if you’re the Mariners, you’d WANT higher variation, since you need to come from behind. Teams that are ahead would prefer consistency.

  229. Chris Miller on August 21st, 2007 10:25 am

    Minor league tranlsations are not a guess, and a .720 OPS is conservative anyway, but yeah, the deviation is greater. I say give him more starts, and start benching Turbo, despite what it looks like to our McLaren (whom, I suspect thinks BA is a good gauge of hitting), Turbo is as useless of a player as you could have starting, well, except maybe WFB.

  230. gwangung on August 21st, 2007 10:27 am

    The main problem I have with this is that the prediction of Jones’ OPS is basically just a wild ass guess based on his AAA numbers and an estimation of what his current ability to hit major league pitching might be.

    Hey, guess what? That’s what EVERY GM and Manager do when they promote minor leaguers to the bigs. And they really, really, really don’t usually promote minor leaguers with bad numbers (Rene Rivera excepted).

    This is above and beyond the FACT that minor league numbers ARE projectable. That’s the whole idea of statistics. The great majority of times, the numbers will pan out–there are exceptions, but those are the rather small minority of cases.

    What you’re doing is betting on the lower probability for the sale of certainty. The irony of this is a bit unsettling.

  231. rsrobinson on August 21st, 2007 10:46 am

    You chose Gordon as a reason we can’t expect Jones to hit well. You could also have chosen Hunter Pence or Ryan Braun or some other rookie who has hit well. That’s anecdotal evidence and is a poor predictive tool. You might as well say he’ll hit 350.400.550 for the rest of the year.

    Here’s an early evaluation of this year’s rookie class about five weeks into the season by Keith Law on ESPN Insider:

    “Twelve MLB rookies have at least 75 at-bats this year; Carlos Ruiz of the Phillies leads the group in batting average at .272, while only three of the 12 are slugging over .400. So you might say the rookie hitting class has been disappointing so far.”

    In general, this year’s rookie class didn’t come sprinting out of the gate. I did say that some rookies hit well immediately, as Hunter Pence did after being called up, while others have struggled. My point is there’s less predictability in estimating how rookies will perform compared to veterans. That makes Jones a gamble with just six weeks remaining in a tight pennant race because the team can’t afford any Alex Gordon-like struggles right now.

  232. Dave on August 21st, 2007 10:57 am

    Players making their major league debut this year are, as a group, hitting .264/.324/.418. That includes pitchers, which obviously drags the numbers down. If I take out anyone with less than 50 at-bats, I’m left with 25 players, and they’re hitting .272/.336/.439. The major league average hitter is hitting .269/.337/.420.

    Debuters > Better Than Average.

    You can point to Alex Gordon and Felix Pie all you want. It doesn’t change the reality of the existance of Ryan Braun, Hunter Pence, Josh Hamilton, Travis Buck, Billy Butler, or Mark Reynolds.

    Evidence is not on your side.

  233. gwangung on August 21st, 2007 11:10 am

    My point is there’s less predictability in estimating how rookies will perform compared to veterans. That makes Jones a gamble with just six weeks remaining in a tight pennant race because the team can’t afford any Alex Gordon-like struggles right now.

    And your evidence that Jones will struggle that mightily is….?

    Again, you’re embracing the LOWER probability in the name of certainty. Ironic, and not very smart.

  234. rsrobinson on August 21st, 2007 11:36 am

    Conversely, while you can point to rookie success stories that also doesn’t change the reality of the existance of Alex Gordon and Felix Pie. And, as I pointed out, even Gordon has improved over time to the point where his season numbers aren’t terribly below average. But he (and some of the other rookies who produced those numbers) had the time to adjust to MLB pitching over the course of the season, a luxury that Jones doesn’t really have going into September during a tight pennant race.

    And even if Jones does produce about average numbers for a rookie hitter that still leaves him less productive at the plate than either Vidro or Ibanez who both have season wOBA’s that are more than ten points higher than the MLB average, according to the statistical formula you introduced in your original post.

  235. Jeff Nye on August 21st, 2007 11:56 am

    I cannot even begin to comprehend why you’re willing to embrace the worst possible projected performance for Adam Jones just because he is a young player, despite an easily projectable strong track record from AAA; but you assume that the best case scenario for an older player who has a THREE YEAR HISTORY OF SUCKING prior to the last three weeks is his true talent level.

  236. gwangung on August 21st, 2007 12:00 pm

    Conversely, while you can point to rookie success stories that also doesn’t change the reality of the existance of Alex Gordon and Felix Pie. And, as I pointed out, even Gordon has improved over time to the point where his season numbers aren’t terribly below average. But he (and some of the other rookies who produced those numbers) had the time to adjust to MLB pitching over the course of the season, a luxury that Jones doesn’t really have going into September during a tight pennant race.

    So, basically, you want to GUARUNTEE the worst case in order to make your case?

  237. gwangung on August 21st, 2007 12:01 pm

    I cannot even begin to comprehend why you’re willing to embrace the worst possible projected performance for Adam Jones just because he is a young player, despite an easily projectable strong track record from AAA

    Yeah, that’s embracing the lower probability for the sake of certainty.

    Casinos LOVE that kind of thinking.

  238. terry on August 21st, 2007 12:11 pm

    And now his 2007 wOBA, taking into account all hot and cold streaks over 420 ABs, is only slightly below his career average and is still above the major league average by more than ten points.

    An average of the five publicly available projection systems suggest Vidro would have a wOBA of .342 for ’07. After his first 363 AB, his wOBA was .338. During his last 54 AB his wOBA has been a staggering .437 bringing his seasonal wOBP to .353. Even after his August, Vidro is actually a below average DH based upon the wOBA of any DH with over 200 PA’s. Only two qualified DH’s are worse than Vidro (Sosa and Huff). I know you’ve started hedging your argument by backing away from suggesting streaks are predictive to now suggesting a comparison of Vidro’s ’07 season total with his career average is predictive. However, even based upon that criteria, Vidro has been below average relative to his true peer set.

    In reality you’re arguing that Vidro’s last 54 at bats informs more than his first 363 and not only that but those 54 at bats tell you more going forward than a survey of projections systems whose methodologies have been tested and whose results have had very high correlations to actual results in the past.

    The main problem I have with this is that the prediction of Jones’ OPS is basically just a wild ass guess based on his AAA numbers and an estimation of what his current ability to hit major league pitching might be.

    Not only are minor league numbers translatable into major league performance (i.e. this is what a player would’ve done in the majors based upon the numbers he put up in the minors-Thank you Bill James), but as more college and minor league data have become accessible, the performance of such players has become much more projectable as well (thank you Nate Silver et al.)

  239. terry on August 21st, 2007 12:12 pm

    And now his 2007 wOBA, taking into account all hot and cold streaks over 420 ABs, is only slightly below his career average and is still above the major league average by more than ten points.

    An average of the five publicly available projection systems suggest Vidro would have a wOBA of .342 for ’07. After his first 363 AB, his wOBA was .338. During his last 54 AB his wOBA has been a staggering .437 bringing his seasonal wOBP to .353. Even after his August, Vidro is actually a below average DH based upon the wOBA of any DH with over 200 PA’s. Only two qualified DH’s are worse than Vidro (Sosa and Huff). I know you’ve started hedging your argument by backing away from suggesting streaks are predictive to now suggesting a comparison of Vidro’s ’07 season total with his career average is predictive. However, even based upon that criteria, Vidro has been below average relative to his true peer set.

    In reality you’re arguing that Vidro’s last 54 at bats informs more than his first 363 and not only that but those 54 at bats tell you more going forward than a survey of projections systems whose methodologies have been tested and whose results have had very high correlations to actual results in the past.

    The main problem I have with this is that the prediction of Jones’ OPS is basically just a wild ass guess based on his AAA numbers and an estimation of what his current ability to hit major league pitching might be.

    Not only are minor league numbers translatable into major league performance (i.e. this is what a player would’ve done in the majors based upon the numbers he put up in the minors-Thank you Bill James), but as more college and minor league data have become accessible, the performance of such players has become much more projectable as well (thank you Nate Silver et al.)

  240. rsrobinson on August 21st, 2007 12:13 pm

    I’m not projecting anything and have guaranteed nothing. Others here, though, have projected Jones’ predicted OPS numbers based on virtually no major league track record while also projecting OPS numbers for Vidro and Ibanez through the remainder of the season that are lower than the OPS numbers they’ve produced so far over 122 games and 420+ at bats.

  241. Dave on August 21st, 2007 12:13 pm

    Alex Gordon and Felix Pie, meet Andruw Jones, Johnny Damon, Richie Sexson, Nomar Garciaparra, Rafael Furcal, Scott Rolen, and Vernon Wells.

    Guess its not just young players who struggle.

  242. rsrobinson on August 21st, 2007 12:23 pm

    In reality you’re arguing that Vidro’s last 54 at bats informs more than his first 363 and not only that but those 54 at bats tell you more going forward than a survey of projections systems whose methodologies have been tested and whose results have had very high correlations to actual results in the past.

    So what you’re saying is that, while usually a larger sample size is preferable because hot streaks and cold streaks tend to balance out, in Vidro’s case a smaller sample size is preferable and hot streaks should be excluded?

    In other words, when Vidro hits poorly that’s representative of his current ability but when he hits well that’s not representative and therefore should be excluded?

  243. terry on August 21st, 2007 12:23 pm

    I’m not projecting anything and have guaranteed nothing. Others here, though, have projected Jones’ predicted OPS numbers based on virtually no major league track record while also projecting OPS numbers for Vidro and Ibanez through the remainder of the season that are lower than the OPS numbers they’ve produced so far over 122 games and 420+ at bats.

    You’re ignoring that not only is past minor league performance translatable to the majors, a future major leaguer’s performance is projectable as well.

    It’s a faulty argument to evoke the concept of having/lacking a track record because it begs too many questions that have been debunked (i.e. a player should be expected to perform to the back of his baseball card, minor leaguers require adjustment periods, until a proven level of performance has been exhibited it is impossible to reasonably estimate an expected level of performance etc….).

  244. heyoka on August 21st, 2007 12:25 pm

    Dave, I’ve respected your arguments until that one.
    The “Debuters > Better Than Average” argument is really convenient this year. So then, why doesn’t every team just debut someone at every position every year? Is that really a better strategy? That’s a a case of cherry picking evidence. This year is a fluke for debuters.

    What were we “conservatively” predicting Jeremy Reed to do? The trouble with making predictions based on minor league numbers is the variance.

  245. terry on August 21st, 2007 12:28 pm

    So what you’re saying is that, while usually a larger sample size is preferable because hot streaks and cold streaks tend to balance out, in Vidro’s case a smaller sample size is preferable and hot streaks should be excluded?

    In other words, when Vidro hits poorly that’s representative of his current ability but when he hits well that’s not representative and therefore should be excluded?

    No.

    What I’m saying is that when projecting what to expect from Vidro during the remaining 40+ games of the season, it’s much more reasonable to expect a wOBA of .342 than one of .437 or even one resembling that of a league average DH (.364).

  246. arbeck on August 21st, 2007 12:28 pm

    rsrobinson,

    The difference is, we show our work. You don’t.

    There is a ton of information on how to project major league results from minor league numbers. With about a 90% confidence you can show that Adam Jones would put up a OPS of 700-850.

    Do you really think Vidro is going to hit .400+ for the rest of the year? Best case scenario for Vidro is that he hits at his current season averages. More likely, he hits at about what he did in the first half.

  247. Dave on August 21st, 2007 12:29 pm

    You’re reading a conclusion into a statement that wasn’t there.

    This year is not a fluke for debuters. There’s a giant myth about the relative unpredictability of guys without major league experience that you guys are buying into. It’s not true. Yes, some young players come up and struggle while others succeed. The same is true of guys with major league track records.

  248. rsrobinson on August 21st, 2007 12:50 pm

    What I’m saying is that when projecting what to expect from Vidro during the remaining 40+ games of the season, it’s much more reasonable to expect a wOBA of .342 than one of .437 or even one resembling that of a league average DH (.364).

    I don’t see a reason to expect either a .342 or a .437 wOBA based on the full 2007 statistical sample size. Vidro’s 2007 wOBA, according to my calculations, is .353 so that would seem to be the most reasonable expectation going forward, rather than either extreme.

    With about a 90% confidence you can show that Adam Jones would put up a OPS of 700-850.

    That’s a pretty wide range and which end of it he came closest to (presuming he wasn’t one of the 10% outside that range) would make a difference in his value as a defensive upgrade in LF.

  249. arbeck on August 21st, 2007 12:53 pm

    rsrobinson,

    Even at the low end, say 720, as Dave suggested, his defense would make the team better. Remember you are replacing Vidro’s bat against RHP with Jones. You don’t loose Ibanez’s bat.

    There is also the possibility he puts up an 850, or better. So the upside is much greater than rolling out Vidro and Ibanez everyday.

  250. julian on August 21st, 2007 1:03 pm

    From 225/skyking:

    So, statisticians, is there a test you can perform on a sample of outcomes to judge how random it actually is — some sort of binomial test based on streaks?

    (Of course, such a test wouldn’t tell you if any particular streak was significant or not, it would just tell you whether or not you should bother looking for meaning in streaks in the first place.)

    Well, you’d have to make some assumptions and define your terms (eg. “streak”), but such a test is certainly possible. In fact, there have been several interesting cases where such techniques have been applied to detect scientific fraud (scientists fudging data aren’t very good at making up “random” numbers in the same way that college students aren’t).

    An easy procedure for testing for randomness in coin tosses would be:

    1. Define a “hot streak” as X coin tosses in a row yielding the same result – this could be loosened to make the scenario more baseball-realistic

    2. Simulate N sequences of Y tosses. Record how many streaks of length X occurred in each of the N sequences.

    3. Calculate the number of streaks in your “real-life” data, and consider where that number sits in the distribution of the expected number of streaks estimated under the assumption that the data are truly random. If the number of streaks is in the extreme upper (way more streaks) or extreme lower (way fewer streaks) tail of the distribution, then you might conclude that there is evidence to suggest that the process which generated your data wasn’t entirely random.

  251. insidetheparker on August 21st, 2007 1:16 pm

    Hey Ryan Braun is a rookie. That means Adam Jones will hit .341 with 24 homers and 62 RBI…

    Give me a break!!!

  252. Jeff Nye on August 21st, 2007 1:40 pm

    …What?

  253. julian on August 21st, 2007 1:43 pm

    To follow up on my previous post, I decided to run the numbers and do a simple analysis to see how many “hot streaks” we would expect to see over the course of a season if batting were exactly like coin flipping (i.e. an at-bat is just like a coin toss, except with probability of heads closer to .3 than .5). We use the following default values:

    - Probability of a hit: 0.3 (i.e. a decent hitter)
    - Number of at-bats: 500 (i.e. roughly a season’s worth of ABs)
    - Number of simulations: 100 (i.e. 100 different seasons or 100 different players with the same inherent skill)
    - A streak is defined as a sequence of 20 at-bats where the proportion of hits is 0.45, i.e. the batter is batting 50% better than their average. Overlapping streaks are not allowed.

    We also assume that tosses (i.e. at-bats) are independent, which is probably a bit strong in reality but makes things a lot simpler. With these values, here’s a table showing how many streaks occurred over the 100 simulated seasons/players:

    1 2 3 4 5 6 7 8 9
    —————————-
    3 9 24 22 18 10 10 3 1

    So, overall, it looks like we should be expecting 4-5 such streaks over the course of 500 at-bats from a .300 hitter, but it’s not implausible to see as few as 2 or as many as 7-8.

  254. julian on August 21st, 2007 1:47 pm

    Gah, formatting got messed up. Here’s a better view:

    # Streaks %
    1 3
    2 9
    3 24
    4 22
    5 18
    6 10
    7 10
    8 3
    9 1

  255. heyoka on August 21st, 2007 1:49 pm

    Hmmm Dave, I’m looking at recent years, and you’re right – it’s not a fluke year for debuters, there’s a “hunter pence” and “ryan braun” every year essentially.
    But I still believe you’re tweaking your “debuter” numbers. Does the average big leaguer get the less than 50 ABers removed? And if this is such a good strategy why isn’t everyone replaced with someone making his debut? (and shouldn’t we be looking at the median anyway?)

    back to the hot streaks: some times a sustained hot streak is explainable, many times an unreported injury healed or change in apporach. Vidro’s hot streak is starting to lose it’s ‘streak’ quality. Perhaps there is actually an adjustment being made. I’ve noticed a better plate presence, and his hits have become less ‘bloopy’ and more solid ground balls up the middle. You’ve got to admire his ability to hit to the right side with a man on third. I didn’t like this guy when he showed up, and I still think we’d be better off without him next year (inevitable decline), but this year he’s earned some respect, and justified playing time. I’m not convinced Adam Jones will be an upgrade over him (moving ibanez to DH). [Though I still don't see why we keep Ibanez vs. LHP.]

    Speaking of hot streaks, when is Richie Sexson’s hot second half supposed to start?

  256. ghug on August 21st, 2007 2:00 pm

    255-November, lets hope Seattle has a freak blizzard.

  257. Chris Miller on August 21st, 2007 2:02 pm

    Vidro’s hot streak is starting to lose it’s ’streak’ quality

    It’s been a month, lots of guys end up hitting .400 for a month.

  258. Chris Miller on August 21st, 2007 2:03 pm

    This thread has been linked to by Tango: http://www.insidethebook.com/ee/index.php/site/article/streaks1/ – basically he agrees with Dave.

  259. julian on August 21st, 2007 2:40 pm


    It’s been a month, lots of guys end up hitting .400 for a month

    Let’s see what the numbers say to that (see post #253 for sim method). Assuming a .300 hitter:

    Streak length: 0 1 2 3
    % players/seasons: 76 17 6 1

    So about 25% of .300 hitters would be expected to have at least one month where they bat > .400. Alternatively, you could say that a guy like Vidro would be expected to have one month-long hot streak about once every four years on average.

  260. julian on August 21st, 2007 2:42 pm

    Quick correction to above, table should read:

    # of streaks: 0 1 2 3
    % players/seasons: 76 17 6 1

  261. rsrobinson on August 21st, 2007 2:50 pm

    Actually it sounds like Tango basically agrees with me, too. What I’ve been saying is that Vidro’s and Ibanez’s improved performance in the second half has now put their season numbers near their career norms, both now appear to be relatively healthy, and it’s likely they’ll continue to hit near their career norms (which are pretty good) for the remainder of the season.

    Obviously, it’s very unlikely that either will continue indefinitely on the pace they’ve been on recently (especially Ibanez who’s been hitting bombs at a ridiculous pace for the past two weeks) but there appears to be an assumption by some here that when their performance does drop it will return to their previous subpar levels of earlier in the summer rather than to expected career norms. I don’t see any real reason to believe that.

  262. Pete Livengood on August 21st, 2007 3:09 pm

    I’ve read this thread with great interest (great read, in general mostly interesting comments, but no reason to throw in my $0.02 since others were making all of the points I would make), but….

    “…there appears to be an assumption by some here that when their performance does drop it will return to their previous subpar levels of earlier in the summer rather than to expected career norms.”

    …man! rsrobinson, you seem to be a reasonably intelligent person, but who the hell is saying that? Dave’s post, and comments, specifically refute what you claim “some here” are saying. That was the whole point of Dave’s mea culpa about writing Raul off too early based only on 2007 performance and not giving enough credence to his 3-year numbers.

    I’m pretty sure, based on what Dave has written previously about Jones and Ibanez, and what he has written here, that he would say that this post is only about whether a guy like Ibanez’ or Vidro’s current streaks are predictive of them continuing to tear it up, as opposed to regressing to their previously established, 3-year means. I also think that Dave miht say (based on previous posts) that even at those levels for Ibanez and Vidr, what we can project about Adam Jones (offensively and defensively justifies getting him in the line-up even if they do.

    He’s not talking about clubhouse chemistry and politics, which he has acknowledged do (and to some extent should) play a part in how and whether that decision gets made. He’s just arguing the vaidity of the common, misperceived value of the streak as a predictive tool in making that decision. It’s amazing to me how many intelligent USSM readers do not seem to get this…..

  263. RealRhino on August 21st, 2007 3:18 pm

    Dave, I think your conclusions are incorrect and that they go too far. It was fine when you stated that a hot/cold streak has little predictive power. Fine.

    But in the last paragraphs where you apply the lesson to the Mariners, I just don’t think you can necessarily reach the conclusions you did about Ibanez and Jones. First, you conclude that if you thought Jones was better than Ibanez at the end of April, you should think so now. In one sense that’s true; if you believed an Ibanez hitting his expected wOBA and providing negative defense was worse than Jones, you should still think that. In another sense, however, and the one probably most commonly held by fans of this site and the M’s generally, I think that’s wrong. I think many people believed that Ibanez was worse than Jones because his bat had gone off a cliff, or that he was injured and unable to perform at expected levels. If that’s what you thought, then it would be reasonable for his hot August to change your opinion, not because you necessarily expected his hot streak to continue, but because I think this shows his bat has likely NOT gone off a cliff, and he is likely NOT injured and unable to perform. Still, I will accept that those people could be *wrong* about their opinion that the “normal” offensive Ibanez was better than the “expected” Jones to begin with, but if that’s what they thought, then their post-July opinions about the relative value of the two *should* have changed.

    More seriously, the comment about “whatever you thought of Ibanez on July 31 shouldn’t have changed” is just wrong, IMO. It has to have changed with the new data we have. Again, I would bet that almost everybody on the site would have said that Ibanez was done or that he was injured at that time. Not just a decline due to the aging process, but flat-out done. Not a “random variance” in his performance, but done. If his August doesn’t make you think, “Hey, maybe he’s not done, maybe he was hurt or just had one of those freak bad months,” then I don’t think you are paying attention.

  264. arbeck on August 21st, 2007 3:22 pm

    rsrobinson,

    No one is saying they are going to fall off the earth. However, Vidro and Ibanez are unlikely to hit at about their career norms for the rest of the year. If you really wanted to project accurately you’d use an amalgamation of their last 3 years of data, with the most recent weighted more than the oldest. I’ll use OPS+, since I like that stat.

    Ibanez has a career OPS+ of 110. This year he’s at 117. If I take his last three years and weight them 5/3/2, we’d get a projection of about 119. Since he is on the downside of his career, I’d say 115-120 is probably where he’ll be at. He will likely outperform his career, and be about at his season average going forward.

    Vidro is another case all together though. Vidro has a career OPS+ of 109. But it’s been trending downward since ’02. Using the same weights as before I’d give him a projection of 104 for the rest of the year. So he’s not likely to even live up to his career averages. In fact, a 104 OPS+ may be too optimistic. He’s a career .302 hitter who is currently hitting .317. His slugging percentage has fallen every year since ’02, including this year! I’d consider it lucky if he puts up an OPS+ of 104 for the rest of the year.

    So what this says is that Adam Jones would have to put up about an OPS of 700-720 to replace Vidro’s bat completely going forward. And that’s without any defensive contribution. He could probably put up about a 600 and still contribute enough with the glove to make it a wash.

  265. terry on August 21st, 2007 3:30 pm

    Actually it sounds like Tango basically agrees with me, too. What I’ve been saying is that Vidro’s and Ibanez’s improved performance in the second half has now put their season numbers near their career norms, both now appear to be relatively healthy, and it’s likely they’ll continue to hit near their career norms (which are pretty good) for the remainder of the season.

    Obviously, it’s very unlikely that either will continue indefinitely on the pace they’ve been on recently (especially Ibanez who’s been hitting bombs at a ridiculous pace for the past two weeks) but there appears to be an assumption by some here that when their performance does drop it will return to their previous subpar levels of earlier in the summer rather than to expected career norms. I don’t see any real reason to believe that.

    But what is a career norm for a 32 yo second baseman with Vidro’s history of leg injuries, career path and physique etc? A summary of the five projection systems suggests it’s a wOBA of .342….

    What about Raul? An average of the 5 projection systems suggests a wOBA of .350 which incidentally isn’t good enough to carry his glove.

    BTW, I seriously doubt Tango would agree that 400 AB is a large enough sample to inform the type of conclusion you’re drawing….

  266. rsrobinson on August 21st, 2007 3:36 pm

    …man! rsrobinson, you seem to be a reasonably intelligent person, but who the hell is saying that? Dave’s post, and comments, specifically refute what you claim “some here” are saying.

    I’m not talking about Dave, but about some of his more enthusiastic followers.

    He’s just arguing the vaidity of the common, misperceived value of the streak as a predictive tool in making that decision. It’s amazing to me how many intelligent USSM readers do not seem to get this…..

    And I agree that this is an interesting theoretical argument that may be statistically correct but I see very little chance it will ever be applied in the real world to any significant degree. No manager in baseball would sit and watch Raul Ibanez crush nine homeruns over two weeks and then bench him because there’s no proven predictive value in that performance. He wouldn’t do it for any number of real world reasons including the fact that he’d probably have a clubhouse revolt on his hands if he did.

    And this is one of my criticisms of sabermetrics purism. A lot of it is extremely useful but, IMO, there’s too much of a tendency to exclude the human factor in arguing for its application in the real world where all kinds of messy variables may make it impractical or downright foolish.

  267. gwangung on August 21st, 2007 3:50 pm

    And this is one of my criticisms of sabermetrics purism. A lot of it is extremely useful but, IMO, there’s too much of a tendency to exclude the human factor in arguing for its application in the real world where all kinds of messy variables may make it impractical or downright foolish.

    That’s because YOU aren’t taking into account the real world factors that can make it work.

    “Raul? We want to keep you fresh. We’ll set you down a day every week or so to keep you strong for the stretch. The last week–see what happens when you rest those hammies and shoulder?”

    “Ichi? We’ll have you DH a day a week to keep YOU fresh.”

    “Mr. Guillen…we’ll keep YOU fresh…”

    Hm. That adds up.

  268. julian on August 21st, 2007 4:00 pm

    Ok, maybe now I’m getting carried away, but…

    Some interesting analysis on the prevailing view that Raul was “finished” before his current hot streak:

    Raul’s BA on July 31: .253
    Rauuuul’s BA for his career: .283

    Probability (based on 1000 simulations) that a .283 hitter bats .253 or worse over his first 400 or so at-bats: 0.086 = approx. 9%

    So, quantitatively, this is a fairly strong indication that the data was not generated by a .283 hitter, i.e. that Raul’s true skill had declined. Of course, Raul’s subsequent hot streak might suggest that for the first few months of the year, we were observing one of those 9% of seasons where a .283 hitter hit well below his average (

  269. Bernoulli on August 21st, 2007 4:06 pm

    The opposition to sabermetrics will always be linked to (baseball, not political) conservatism.

    No manager in baseball would sit and watch Raul Ibanez crush nine homeruns over two weeks and then bench him because there’s no proven predictive value in that performance. He wouldn’t do it for any number of real world reasons including the fact that he’d probably have a clubhouse revolt on his hands if he did.

    The thing is that winning is what matters. Every once in a while, a manager or a GM will change the way baseball is seen. Yes, usually that inspiration comes from a small-market, last-place team that has less risk of failure. But baseball has changed a lot in the last ten years, and apparently crazy ideas can take root once they’re seen to work. You can actually try to use your closer in non-save situations again. You can actually bat Juan Pierre eighth instead of first even though he’s the fastest guy on the team. You get the idea.

    If an idea helps teams win more games more of the time, it will eventually become accepted. Sabermetrics is the argument. Listen to the argument. Don’t just say “It’d never work, it’ll make the left fielder cry.” Besides, it’s far more interesting to talk about how to make the team better than it is to simply shout “Go M’s” two hundred and seven times per thread.

  270. Pete Livengood on August 21st, 2007 4:09 pm

    rsrobinson wrote:

    “. . . I agree that this is an interesting theoretical argument that may be statistically correct but I see very little chance it will ever be applied in the real world to any significant degree. No manager in baseball would sit and watch Raul Ibanez crush nine homeruns over two weeks and then bench him because there’s no proven predictive value in that performance.”

    First, nobody is suggesting “benching” Raul Ibanez (and that’s not what this thread is about). There is a suggestion that Jones in LF at least some of the time in place of Raul, and a reasonable platoon of Raul and Vidro at DH (again, at least some of the time) would be a better use of all three players’ skill sets and values.

    Second, the reason you might “bench” Ibanez (or Vidro) isn’t because there is no predictive value in their recent past performance, but because the most reasonable/accurate projections suggest the team might benefit if you did. Again, this isn’t a “real world” argument/post, and most reasonabe people you are arguing with would concede that there are many other factors to consider before you would “bench” one of these guys.

    But you seem to be accepting as a trusim (in the absence of “proof” to the contrary that could only be gathered if the manager did what you argue he shouldn’t because of the lack of that proof – play Jones more and platoon Vidro and Raul at DH some) that you should not consider these “purist” statistical arguments unless and until these guys come back to earth, which is to essentially buy into the predictive value of the streak. It won’t hurt Ibanez or Vidro to get a day off here and there (especially Raul vs. lefties), and as long as it isn’t taken too far, I don’t think there would be a clubhouse revolt. There just needs to be better balance, and ignoring the statistical argument because of it human impracticality doesn’t make the alternative more reasonable.

  271. rsrobinson on August 21st, 2007 4:57 pm

    It won’t hurt Ibanez or Vidro to get a day off here and there (especially Raul vs. lefties), and as long as it isn’t taken too far, I don’t think there would be a clubhouse revolt. There just needs to be better balance, and ignoring the statistical argument because of it human impracticality doesn’t make the alternative more reasonable.

    I’ve never said that Ibanez or Vidro shouldn’t get days off or that Jones shouldn’t be given the opportunity to play whenever possible. There’s still 40 games left and guys obviously need days off, especially considering the M’s brutal schedule down the stretch.

    I used Raul Ibanez’s recent streak as an example of a guy on a hot streak (a pace of nine homeruns in thirteen games is obviously unsustainable for any length of time) because it ties into the argument here. I don’t believe anyone in the clubhouse, including Adam Jones, thinks for a second that it’s a good idea to bench a guy who’s been torching the ball like Raul has lately. If the guy was legitimately tired and needed a day off you MIGHT be able to convince him of that.

    Players spend years of sweat and sacrifice, hour after hour in the batting cage, watching video, lifting weights, etc. to be able to get on a roll like Raul has been on over the past two weeks. Most would rather have their teeth pulled out with rusty pliers than be pulled from the lineup when they’re hitting like that and you’ll never, ever convince them this has no value in predicting how they’ll hit today. If you sit a guy hitting like that based on nothing more than a purist adherence to statistical probability then you risk losing not only him but the ballclub.

  272. terry on August 21st, 2007 5:38 pm

    Players spend years of sweat and sacrifice, hour after hour in the batting cage, watching video, lifting weights, etc. to be able to get on a roll like Raul has been on over the past two weeks. Most would rather have their teeth pulled out with rusty pliers than be pulled from the lineup when they’re hitting like that and you’ll never, ever convince them this has no value in predicting how they’ll hit today. If you sit a guy hitting like that based on nothing more than a purist adherence to statistical probability then you risk losing not only him but the ballclub.

    Willie Bloomquist probably thinks he has a 3 for 4, 3 rbi night in his bat tonight…should he start in center over Ichiro?

    Seriously, motivating players is a big part of managing but it doesn’t trump the most important part of the job-fielding a group of players that gives the team it’s best chance of winning on a given night.

  273. Jeff Nye on August 21st, 2007 6:38 pm

    Just out of curiousity…do we know the name of Vidro’s agent?

  274. Chris Miller on August 21st, 2007 7:17 pm

    FWIW I think everybody is in agreement, outside of the rsrobinson saying Ibanez and Vidro should be kept in the lineup, and AJ the one fighting for playing time. I think Ibanez and Vidro would make a fine DH combo w/ Ibanez and Jones splitting time, as well as Jones spelling Gullen and Ichiro once in a while. That gives each one of them 5-6 games a week, leveraging platoon splits as much as possible. AJ’s glove is too good (relative to Ibanez) to keep out at this point.

  275. skyking162 on August 22nd, 2007 8:16 am

    Julian — good stuff, although I disagree with your conclusion in 268. 9% seems low, but if you look at all hitters expected to hit near .283, I bet about 9% of them hit .253 or under in their first 400 at-bats (although it’s also likely that some without solid starting jobs were benched before they could reach 400 at-bats.) Events with low probability aren’t automatically significant when you notice them — you wouldn’t have noticed them if they didn’t happen occasionally.

    And fyi, I’ve found stats articles referring to a runs-test and a runs-length test for randomness. Trying to get more info…

  276. Pete Livengood on August 22nd, 2007 9:51 am

    rsrobinson said:

    “I’ve never said that Ibanez or Vidro shouldn’t get days off or that Jones shouldn’t be given the opportunity to play whenever possible.”

    I realize you didn’t – my fault. I said that because that seems to be
    McLaren’s M.O. during the streak, and because you seemed to be defending McLaren’s use of the streak as a predictive factor (even if possibly only for reasons of player and clubhouse reaction).

    “I used Raul Ibanez’s recent streak as an example . . . because it ties into the argument here. I don’t believe anyone in the clubhouse, including Adam Jones, thinks for a second that it’s a good idea to bench a guy who’s been torching the ball like Raul has lately.

    Without necessarily disagreeing with your prediction of player reaction, the inmates don’t run the asylum. All players understand that streaks end, and if you told them “look, we’ve looked at this pretty carefully, and for most players, the fact that they’ve been rolling hot for a while doesn’t really predict going forward that they’ll do better than their usual longer-term averages, so we think we’re better off judging each coming game based on who has the best match-ups and gives us the best chance to win, rather than who’s riding a streak right now. We’ve got a brutal schedule coming up anyway, and everybody will be fresher this way.” For Raul, you would reference the fact that the team (reasonably) expects Jones will provide better defense in LF and that that will be relatively more important when certain pitchers are in the game, and that his numbers against lefties will probably determine which games he’ll sit. With Vidro, it’s a bit trickier, but he needs to understand that a guy who provides more than just singles and some OBP can be replaced by a guy like Raul at DH occasionally against RHP. You don’t really have to explain any of this to Jones, as he’s the beneficiary of all this. And none of them have to like it; it just has to work, and working provides its own justification.

    In the end, if it works and you’re winning, you won’t lose anybody. If Jones isn’t hitting or is providing poorer defense that we think he will over a longer sample than the few games he’s getting here and there now, nothing says you can’t adjust. Even within games, this gives you a much better bench….

  277. nathaniel dawson on August 22nd, 2007 2:00 pm

    Bravo, bravo!

    They are absolutely correct in The Book when they seay that people make way too much out of way too little. And that remains true for many other issues in baseball analyses besides hot and cold streaks.

    And just like we really shouldn’t evaluate Raul Ibanez’ or Jose Vidro’s talent level any differently today than we did a month ago, nobody should have evaluated them differently in July as they did at the beginning of the season. Anybody that thought Ibanez was good enough to be in the starting lineup in April should have still thought so in July.
    Nothing that happens over a period of time as short as a partial season should have much bearing on an opinion of talent level.

Leave a Reply

You must be logged in to post a comment.