Explaining Away Regression To The Mean

Dave · July 21, 2009 at 7:43 am · Filed Under Mariners 

Odds are you’ve read a story lately about how Russell Branyan is struggling as he reaches the summer of his first season as a full-time player. After a monstrous first half, he’s not hitting as well lately, and the explanations are pouring in. He’s tired. His back hurts. Pitchers are figuring him out. Managers have figured out how to shift against him and he hasn’t adjusted. If you’re looking for a reason for Branyan’s struggles, you have a buffet of choices to blame them on.

Of course, there’s a simpler explanation – it’s just natural regression to the mean.

In April, Branyan posted a .405 batting average on balls in play. In May, it was .391. These are outrageously high totals that nobody in history has been able to sustain, much less a first baseman whose hardest hit balls end up in the seats. There was basically no chance that he’d be able to continue getting balls in play to find a hole 39% of the time. We talked about this quite a bit, warning that regression was coming. A guy who strikes out as much as Branyan does can’t hit .300. It’s almost impossible.

Indeed, regression did come. In June, his BABIP was a more normal .286, right around where we’d expect Branyan’s true talent level to be, based on his skillset. His monthly line was still a good .265/.376/.590, but the batting average didn’t get inflated by balls avoiding gloves in record numbers. July, though, has been uglier – .180/.288/.426, giving rise to all the various theories for the cause of the slump.

Branyan’s BABIP in July? .200. His other, more stable numbers?

13.6% BB% in July, 12.8% BB% for the season
33% K% in July, 28.5% K% for the season
.246 ISO in July, .292 ISO for the season

His walks and strikeouts are barely up and his power is very slightly down. Over 70 plate appearances, we’re talking about basically no difference at all. And, the extra strikeouts are actually just due to some coin flip calls by the home plate ump – his contact rate (69% in July) is higher than it was April-June (67%). There’s literally nothing to worry about here – Branyan’s slump is just normal BABIP variation. He got some good bounces in April and May and he’s got some bad bounces in July. He’s the exact same player he was, and reacting to the results will simply lead to making a bad assumption about what’s going on.

But this happens all the time. Not just with Branyan, but across the board. Remember Sean White’s struggles a few weeks ago? The local media decided it was because he was getting tired after being worked too hard for the first few months. White himself said he felt great, and had no problems, but that didn’t matter. He was giving up hits, and that meant he was running on fumes.

Sean White’s BABIP by month: .182, .182, .333 (he’s exhausted!), .125

White drastically overachieved the first two months of the season thanks to some good defense and good luck. The results started to match his talent level in June, and this was blamed on overwork. He’s been lucky again in July, but there’s no reason given to why he’s no longer tired. And remember, White claimed he felt great the entire time.

Players understand how this stuff works. Branyan was asked about why he’s slumping, and his response was basically “This stuff happens. The season is cyclical. Sometimes you run hot, sometimes you run cold.” (paraphrase because I can’t find the actual quote right now)

For whatever reason, though, people just can’t accept that there is not always a primary driving reason for a change in results. That’s why we get stuff like “so and so has changed his batting stance and is now hitting .500 for the last two weeks”, but you never hear about the new stance again after he goes back to hitting .260. Or, from a Mariner-centric point of view, you’ll hear a lot of talk about how the M’s need to keep their pitching rotation strong to keep the bullpen from regressing due to overwork.

Bad news – the bullpen is going to regress either way. Whether the M’s keep Bedard and Washburn or not, there a bunch of relievers on this team with numbers that are unsustainable. The M’s bullpen has an ERA that is 0.69 runs lower than their FIP, and while the defense is a decent chunk of that, there’s a luck component in there too. Sean White and Chris Jakubauskas are running crazy low BABIPs. 1.8% of Aardsma’s fly balls are leaving the park. These numbers are going to regress. They have to.

And when they do, you’re going to hear explanations for why. White will be tired again. Jakubauskas will have lost the command of his fastball. Aardsma will feeling the pressures of his first pennant race as a closer. We could write the stories right now. But, in the end, it’s just going to be simple regression to the mean, just like we saw with Branyan in June. He ran lucky for two months, had a normal Branyan month, and now is running unlucky. It doesn’t mean anything.

The sooner that we can get the world to embrace the concept of random variance, the better. Results fluctuate wildly in small samples due to uncontrollable factors. That’s just a fact of life, and when we’re forming our opinions, we need to realize just how powerful regression really is.

Comments

63 Responses to “Explaining Away Regression To The Mean”

  1. coasty141 on July 21st, 2009 8:12 am

    Excellent post Dave. Just curious, does using a shift harm a dead pull hitters (Ortiz, Giambi, Brayan)BABIP?

  2. firova2 on July 21st, 2009 8:14 am

    Great stuff. Branyan has said that teams are now doing a shift on him on the infield. That could have an effect on balls in play–perhaps a few more balls that would have gone for singles are being gloved instead. But not so many as to account for his July, I would expect. They don’t worry about him in the outfield. As you say, when he hits it out there, it’s pretty much gone.

  3. TheBird on July 21st, 2009 8:18 am

    The Branyan quote:

    “We’re going to go in cycles, we’re going to have our ups and downs. I’m not putting too much emphasis on it. I can start a hot streak tomorrow and be back up to .300 in less than a week. I’m just fortunate to be given an opportunity to play, and I’m happy the way I’ve been playing, and glad to be given a chance to come back and play.

  4. msb on July 21st, 2009 8:18 am

    from Divish

    “As for his hitting struggles, Branyan downplayed them.

    “This is the first time I’ve played every day in a long time,” he said. “From a critic on the outside looking in, you can point your finger at a lot of different things – not taking my walks, things of that nature. It’s a long season. We’re going to go in cycles, we’re going to have our ups and downs.””

  5. CCW on July 21st, 2009 8:20 am

    All true, unfortunately. One thing we can hope is that there is room on this team for regression in a positive direction as well. Cedeno is the first guy who comes to my mind, as his BABIP has been a bit lower than one would expect. Beltre, too, but he’s not around to regress. Anyone else?

  6. CCW on July 21st, 2009 8:24 am

    Everyone pretty much agreed that the M’s were a .500 team (more or less) that, with some luck, could win 85-90 games and compete for a playoff spot. Luck is essential to their chances this year. Branyan has as good of a chance to put up an 1.000 OPS in August/September as he did in April/May. I won’t count on it, but I sure can hope for it, and that’s part of the fun.

  7. 1000N on July 21st, 2009 8:34 am

    The technical term for this phenomenon is fundamental attribution error. For whatever reason, humans are determined to find causes for differences in outcomes that are due to nothing more than random variation.

    Excellent post, Dave.

  8. eponymous coward on July 21st, 2009 8:35 am

    For whatever reason, though, people just can’t accept that there is not always a primary driving reason for a change in results

    You can pick several from this convenient list, including “Disregard of regression toward the mean — the tendency to expect extreme performance to continue”. Baseball is FULL of demonstrations of these biases in thinking.

  9. DAMellen on July 21st, 2009 8:36 am

    [ot]

  10. Godori on July 21st, 2009 8:36 am

    Based on 1st half individual performances, are the M’s projected to regress to the mean in a positive way or negative way as a team if the roster stays intact (please trade Washborn).

    Branyon is regressing in a negative way. I’m assuming Cedeno and Rob Johnson are regressing in a positive way.

    In short, using this regression to mean tool, should the M’s be sellers or buyers?

  11. BlackHaloBender on July 21st, 2009 8:39 am

    Regression to the mean doesn’t explain WHY he’s slumping, only that he is. The WHY in this case may be explained by balls just not finding gaps in the outfield or something equally un-actionable.

    The mean in this case is too single-season-focused too. We really have no idea what his real full-season stats should be like because the sample size is zero. He has played zero full seasons as a starter. Players have entire seasons well above their ability. And they have entire seasons well below their ability.

  12. floydr on July 21st, 2009 8:42 am

    [more than a little ot]

  13. 1000N on July 21st, 2009 8:43 am

    In short, using this regression to mean tool, should the M’s be sellers or buyers?

    “Regression to the mean” is not a predictive tool. It merely says that, given large enough sample sizes, a player’s performance will trend toward the mean. In other words, both kinds of streaks, lucky and unlucky are sustainable over the long term. However, given a recent lucky streak, a player is just as likely to have another lucky streak as an unlucky one.

  14. Steve Nelson on July 21st, 2009 8:43 am

    I believe that part of human psychology is the refusal to accept that this is largely a random universe. We won’t believe it, it flows against our inherent insistence that life has meaning, and we always look to find some reason for the phenomena we observe. We avidly anoint gurus who will interpret the signs and assure us that the world isn’t truly random, be those priests, shamans, Wall street commentators, psychologists, national security experts, environmental experts, or ESPN commentators. We will stare at a CRT screen that is nothing but random static and insist that we see patterns on the screen, even though we know full well that what we are viewing is totally random.

    In short people will embrace the concept of random variance when people stop being human beings.

  15. 1000N on July 21st, 2009 8:43 am

    I meant “unsustainable over the long term.” Sorry about that.

  16. Dave on July 21st, 2009 8:44 am

    Regression to the mean doesn’t explain WHY he’s slumping, only that he is.

    Because slumps are normal. It’s called variance. There is no reason for normal variance.

    We really have no idea what his real full-season stats should be like because the sample size is zero.

    Amazingly, we’re actually able to infer ability from non-full season data, and it works just fine. So, no, this idea that we have “no idea” what Branyan’s true talent level actually is – that’s bunk.

  17. Steve Nelson on July 21st, 2009 8:53 am

    Regression to the mean doesn’t explain WHY he’s slumping, only that he is. The WHY in this case may be explained by balls just not finding gaps in the outfield or something equally un-actionable.

    You have totally missed the whole concept of random variance and regression to the mean. As Dave mentioned, it is random. Random means there is no reason. You are insisting that there must be a reason for the regression.

    You are rolling a six-sided die. Each number as a 1/6 probability of occurring. After 30 rolls of the die, the number 3 has come up ten times (one-third of the time).

    Now after 100 rolls of the die the number three has appeared 20 times (one-fifth of time). This is regression to the mean. There is nothing different going on in the dice or your throws of the dice. There is fundamental cause driving you to suddenly start throwing 3s with less frequency in rolls 31 through 100 as compared with the first 30 rolls.

  18. Graham on July 21st, 2009 8:58 am

    Of course, introducing people to the concept of regression just leads to it being misused all the time so it’s a lose-lose situation.

  19. BlackHaloBender on July 21st, 2009 9:05 am

    Because slumps are normal. It’s called variance. There is no reason for normal variance.

    I understand that slumps and streaks are normal. But they can still be caused by something. What would you call Ichiro’s first MLB season? A long streak? For sure. But why was it so long? What would you call his second season (or whenever he came back to earth)? Simple regression to the mean? Normal variance?

    If you looked only at the stats maybe that’s what you would conclude. But really we know that allot of it had to do with pitching adjustment and other variables.

    It seems like your point is: regression to the mean is caused by regression to the mean. We aren’t talking nuclear forces here.

    I understand there chance plays a large part of where balls end up in the outfield. I can wrap my head around that. But it seems like you could only prove that in a closed environment with limited variables.

    What am I not getting?

  20. lemonverbena on July 21st, 2009 9:08 am

    I enjoyed this piece 1.000% of times read.

  21. BlackHaloBender on July 21st, 2009 9:08 am

    “You are rolling a six-sided die. Each number as a 1/6 probability of occurring. After 30 rolls of the die, the number 3 has come up ten times (one-third of the time).”

    I understand all of that. I hardly think a die is a good metaphore for a major league at-bat.

    “Amazingly, we’re actually able to infer ability from non-full season data, and it works just fine. So, no, this idea that we have “no idea” what Branyan’s true talent level actually is – that’s bunk.”

    Yeah that’s true. But our understanding of his ability changes with more data.

  22. BlackHaloBender on July 21st, 2009 9:13 am

    “Now after 100 rolls of the die the number three has appeared 20 times (one-fifth of time). ”

    This gets sticky by the way. Because each time I roll a die the chances of me rolling a three are always one in 6. Even if I just rolled 99 3s in a row.

  23. The Ancient Mariner on July 21st, 2009 9:23 am

    Steve Nelson: my brother-in-law the chemical engineer would argue that most things aren’t truly random, but rather are chaotic: they’re unpredictable not because they’re uncaused but because they exhibit such sensitive dependence on such an intricate set of initial conditions that it’s utterly beyond our ability even to know what all the variables are, let alone to understand them sufficiently to predict their effects.

    As such, it’s not actually true that there is no reason for normal variance — which I take to be the core of BlackHalo’s protest — but rather that such variance comes not as a result of a small number of large and quantifiable reasons, but rather of a vast number of reasons too small for us to quantify and control for: the minute shifts in a pitcher’s grip and motion that produce variance in his pitches (the difference between a killer curve and one that hangs up over the middle of the plate, for instance); the differences in environment from stadium to stadium and game to game that also affect ball movement, both off the pitcher’s hand and off the bat; the variance in a hitter’s mechanics over the course of an at-bat, or a season, which make the difference between hitting the ball on the screws one night and just getting a little too far under it the next; split seconds in reaction time that affect whether a line drive is hit right at the second baseman or just too far to his left.

    All these tiny factors, too minute for us to quantify and analyze, multiply each other in every pitch, every swing of the bat, every ball in play, millions of times over the course of a season, to produce variance that — not in its essence, but as far as our ability to analyze and predict goes — we can only call random.

    The encouraging thing to me is that Branyan (and, it seems, Wakamatsu) understands this and isn’t getting too hung up about it.

  24. Gihyou on July 21st, 2009 9:26 am

    There is fundamental cause driving you to suddenly start throwing 3s with less frequency in rolls 31 through 100 as compared with the first 30 rolls.

    What? No, there is nothing ‘driving’ you to throw fewer 3′s after you throw a bunch of them. Previous rolls of a die have zero effect on the next roll.

  25. Tek Jansen on July 21st, 2009 9:27 am

    You are rolling a six-sided die.

    If you are rolling a six-sided die, you are most likely playing Dungeons and Dragons, in which case the performance of your elf, wizard, or halfling will also regress to its mean.

  26. TestaverdeTD on July 21st, 2009 9:36 am

    Do you think the need for a reason is simply ingrained in human nature or comes from years of “explanations” from managers/coaches/GMs?

    I know when I coach I hate (but I do realize) how much of struggles is pure white noise and really out of my hands, as opposed to something I might be able to help with or fix.

  27. julian on July 21st, 2009 9:39 am

    All these tiny factors, too minute for us to quantify and analyze, multiply each other in every pitch, every swing of the bat, every ball in play, millions of times over the course of a season, to produce variance that — not in its essence, but as far as our ability to analyze and predict goes — we can only call random.

    Amen. Very well put.

  28. Mike Snow on July 21st, 2009 9:39 am

    It seems like your point is: regression to the mean is caused by regression to the mean

    Regression to the mean is not a cause of anything. It is a phenomenon. You might call it an explanation, but we’re not talking about a cause-and-effect relationship; the whole point of Dave’s post is that looking for cause-and-effect often leads to analytical errors. That’s part of why Graham says regression to the mean gets misused when people misunderstand it.

  29. Evan on July 21st, 2009 9:41 am

    The technical term for this phenomenon is fundamental attribution error. For whatever reason, humans are determined to find causes for differences in outcomes that are due to nothing more than random variation.

    Philosophers have been trying to get people to stop doing this for 3000 years, but it’s not working. The existence of religion relies on this error: Why did my brother die in the accident but I survived? Must be god.

    Rational analysis appears to be beyond a lot of people.

  30. DMZ on July 21st, 2009 9:46 am

    No! Don’t go there! Don’t do it!

  31. bionicjim on July 21st, 2009 9:51 am

    Remember 2001 – the regression didn’t happen until October. Maybe we can get lucky again?

  32. Sports on a Schtick on July 21st, 2009 9:53 am

    I feel comments will regress this post to the mean.

  33. Manzanillos Cup on July 21st, 2009 9:55 am

    I believe that part of human psychology is…

    We avidly anoint gurus who will interpret the signs and assure us that the world isn’t truly random, be those … pychologists

    I’m all confused.

  34. cdowley on July 21st, 2009 10:00 am

    Ugh, people are having a hard enough time with (relatively) basic statistical theories like regression. PLEASE don’t introduce Chaos Theory to them, you might make their heads implode…

    Anyways, excellent post, and it gives me excellent ammo to use in an ongoing argument about Branyan I have going with a coworker.

  35. CCW on July 21st, 2009 10:04 am

    It’s such a simple concept really. Branyan does not have the skills of a .300 / .400 / .600 hitter. Therefore, he will not hit .300 / .400 / .600 over an extended period of time. The term “regression” only comes into play because he did hit at an inflated rate for two months to begin the season.

    The key is to understand where the “mean” is, which is where ZIPS, PECOTA, etc. come in.

  36. 1000N on July 21st, 2009 10:09 am

    Remember 2001 – the regression didn’t happen until October. Maybe we can get lucky again?

    This isn’t quite right. True regression to the mean requires an infinite number of additional samples. Using the previously mentioned dice example. If I start out by rolling 10 sixes in a row, I don’t ever actually expect that the number of sixes will “regress” to 1/6 of the total. I only expect that I’ll get there asymptotically. Thus, after 100 rolls, I expect a total of 25 sixes (the 10 I already had + 1/6 of the remaining 90). The lucky streak at the beginning merely means that regression to the mean should take me back to about 25% of the total after 100, not 16.67%. After 1000 rolls, I “expect” to be at 175 sixes, or 17.5% of the total, not 16.67%.

  37. Chris_From_Bothell on July 21st, 2009 10:09 am

    Bravo to “The Ancient Mariner” above. He summed up this topic perfectly, for those who simply cannot be satisfied with describing some things as just “random” or “luck” or “we don’t know why”.

    Looking much further into this is at best a tedious exercise in semantics, and at worst is scientific wankery that is annoying as yesterday’s Rob Johnson discussion.

  38. Wolfman on July 21st, 2009 10:09 am

    I can’t wait for the Angels to regress back to their mean. They’re really getting on my nerves!

  39. TranquilPsychosis on July 21st, 2009 10:13 am

    â– Manzanillos Cup on July 21st
    I’m all confused

    How can you confuse something that doesn’t exist?

  40. Elwood P. Dowd on July 21st, 2009 10:15 am

    Thank you for the excellent post, I also thank Ancient Mariner for his. I don’t mind regression to the mean being attributed to new eyeglasses, new batting stance, fatigue, etc. What annoys me is when this sort of thing gets attributed to character, or lack thereof.

  41. Nuss on July 21st, 2009 10:19 am

    But what would columnists and radio talk show hosts have to write or talk about if there wasn’t some “explanation” for some phenomenon?

    Truly, I think the root of it is this: People have a very, very difficult time accepting that something like luck can influence their sporting events. They need to believe that the guys they root for have complete control over the outcomes; to believe that they don’t goes against everything you’re taught growing up about sports (e.g. “If you work hard enough you’ll be a great player”).

    No matter how intuitive it is — what if the shortstop had positioned himself two feet to the right instead of the left? — people just don’t know what to do with the idea that results can be based on something other than execution. Remember how Billy Beane got crucified for his quote about the playoffs?

    Crash Davis had it right. Too bad people won’t accept why he was right.

  42. John S. on July 21st, 2009 10:21 am

    Good post.
    I had to laugh earlier this year when you read (and heard) all about these “eye exercises” Branyan was doing and how it was partly responsible for changing him into this monster hitter.

    While I believe putting in hard work, whether it’s watching candles (Edgar) or something on a computer screen, or putting more time in the cage, or a pitcher learning a new grip can help improve your performance, I absolutely agree that every ballplayer has a basic mean that their career will follow based on their skillset.

    We see young players improve all the time simply from experience, but the “talent” was always there – or not there. Some players work harder than others, which can also boost performance somewhat, but you can only do so much with what you’ve been given. Thus the attraction to PEDs.

  43. Breadbaker on July 21st, 2009 10:25 am

    I am still curious about the shift issue. At least by observation, there have been a number of times when Branyan has hit the ball straight at a second baseman playing short right field which would have been hits if there were no shift on, while I don’t recall instances where he’s been able or willing to hit it down the left field line where the third baseman is not.

    Regression to the mean, in other words, may have some human intervention involved from the reaction of the fielding team. No one is spending a lot of time right now figuring out how best to defense Ronnie Cedeno, while they clearly spent time working on Russell Branyan. In each instance, this may have an effect on their respective regressions to the mean. If Ronnie had a month of .300/.400/.500, someone might spend some time on his scatter charts.

  44. 1000N on July 21st, 2009 10:34 am

    I absolutely agree that every ballplayer has a basic mean that their career will follow based on their skillset.

    Er, OK. The main emphasis of this thread, that deviations from expected means over small sample may have no explanation other than random variation, should not be confused with explainable deviations over large sample sizes: Yuniesky Betancourt’s has seen his “basic mean” diminish over time because he hasn’t worked hard enough to maintain it, while David Eckstein has seen his “basic mean” improve relative to his basic skillset for the exact opposite reason. Sometimes a player really can’t perform up to his usual level because he’s hurt or is having trouble sleeping at night, or whatever.

    The problem is the phony explanations given to explain the random deviations in small sample sizes, the fundamental attribution errors. Just because *some* things can’t be properly attributed to anything other than luck doesn’t mean that *all* things can’t be attributed.

  45. nathaniel dawson on July 21st, 2009 10:35 am

    As such, it’s not actually true that there is no reason for normal variance — but rather that such variance comes not as a result of a small number of large and quantifiable reasons, but rather of a vast number of reasons too small for us to quantify and control for…..

    Your whole comment is the best explanation I’ve ever read concerning random occurence in baseball. Thank you.

    Could we teach this in Sabremetrics 101?

  46. Paul B on July 21st, 2009 10:46 am

    No one is spending a lot of time right now figuring out how best to defense Ronnie Cedeno, while they clearly spent time working on Russell Branyan. In each instance, this may have an effect on their respective regressions to the mean.

    What Dave is doing is using tools like BABIP to check to see if the results we are seeing are natural variation or something else. So when he concludes that we are seeing regression to the mean, he has evidence that points to the conclusion he is making.

    Counter example (simplified): If Branyan’s BABIP was staying relatively flat month to month, but his K% was increasing or decreasing significantly, then there may be a physical cause.

  47. Paul B on July 21st, 2009 11:04 am

    I am still curious about the shift issue. At least by observation, there have been a number of times when Branyan has hit the ball straight at a second baseman playing short right field which would have been hits if there were no shift on, while I don’t recall instances where he’s been able or willing to hit it down the left field line where the third baseman is not.

    Think about what we would see statistically if the shift was having a big impact on his performance. Then look for it.

  48. DoesntCompute on July 21st, 2009 11:23 am

    Player A’s talent is such that he should hit .300 if given an infinite number of at bats.

    Play A now changes his batting stance and it improves his true talent level to be .305. This change means that he will get 1 more hit every 200 at bats.

    If opponents implement a shift at all times and player A is only able to bat .290 against that configuration then the opponents are taking away 1 hit in every 100 at bats (from the original). Anything outside of that variance is random.

    Now a true-talent .330 hitter is more likely to hit .340 for a period of time than is a .250 just as a six sided die is more likely to roll a 1 than is a 20 sided die.

  49. BlackHaloBender on July 21st, 2009 11:40 am

    As such, it’s not actually true that there is no reason for normal variance — but rather that such variance comes not as a result of a small number of large and quantifiable reasons, but rather of a vast number of reasons too small for us to quantify and control for…..

    Thank you. That is what I was trying to say.

  50. SeasonTix on July 21st, 2009 11:52 am

    Dave,

    When are you taking a job in the M’s new Dept of Baseball Statistics?

    They’ve probably already contacted you because we know they read USSM and your statistical analysis is amazing.

    Is that something you are interested in doing when you get out of college?

  51. metz123 on July 21st, 2009 12:04 pm

    Where I have a tough time with regression to the mean is when it is applied to an entire team.

    For example: let us assume that the M’s have the talent of a .500 club. However, they start the season off with a 10 game win streak. Is it expected that over the course of an entire season they will regress and have enough losses that they end up at .500 or are those 10 games “in the bank” and it is expected that from that time forward they are a .500 club and thus will finish up at 10 games over .500?

    Is that trying to use regression to the mean as a predictor of future success, which it shouldn’t be used for?

  52. DMZ on July 21st, 2009 12:06 pm

    We’ve discussed this like a million times already. It’s the second. And no.

  53. Dave on July 21st, 2009 12:07 pm

    Regression to the mean doesn’t know anything about history. What happened in the past doesn’t matter.

    If a .500 team wins their first 10 games of the season, but your opinion of their true talent level doesn’t change, then you now expect them to go 76-76 the rest of the way, and finish with an 86-76 record.

    You do not expect them to go 71-81 so that they finish back at .500. That would be regression past the mean. It’s a logical fallacy.

  54. dchappelle on July 21st, 2009 12:29 pm

    Have you guys seen any research examining differences in the standard deviation of baseball stats? I think I’ve read before that speedy baseball players tend to have higher career BABIP numbers than others.

    Have you seen any examination of the skillset that might result in a more consistent (lower SD) result? It seems that would be a desirable item to examine, although I’d guess we simply do not have the data to come to a reasonable conclusion.

  55. heyoka on July 21st, 2009 12:46 pm

    Regression to the mean may not be the explanation for a lower BABIP if an infield shift is going to permanently lower that number.

    We then may be talking about an entirely different kind of regression. But still out of Branyan’s control (or is it?) and eliminating the “he’s tired” argument.

    As for the bullpen’s eventual regression, I still see that as an argument for having a strong rotation to give the bullpen fewer innings with which to regress and lose games with its true talent level.

  56. bilbo27 on July 21st, 2009 1:37 pm

    Almost nothing in the universe is actually random (speaking with a ridiculous amount of background in mathematics and physics). There is always a reason governed by the laws of the universe. The problem is that at times it is impossible to determine the real “why” because (sometimes) we don’t have the technology and other times because there is too much “noise” to make an accurate determination as to what is truly causing something. So it’s a matter of us through time and experimentation, weeding out the noise from the variables that actually matter.

    A case in point in something we use every day, a computer. Pretty much all languages have a “random” number generator (and most programs use a “random” number somewhere). However, in every single case this “random” number generator is completley deterministic. It’s impossible on anything but a quantum computer (which for now is impossible to make) to actually generate a random number. However, the mathematical functions developed to produce these “random” numbers are such that they make a nice scatter plot on a chart given various inputs (usually the time). So the results appear very random, but in fact are deterministic, same input = same output. But, good luck trying to find the “why” (the function that governs the input/output pairs). If it’s a good mathematical model, it will be impossible with current technology to figure out. (note: not all are good; many in fact are not. The C language, for instance, is crap and it’s very easy to determine their “random” number generator function).

    This was just a simple example with computers, but this type of thing is reflected in nature all over the place. Almost nothing is random (and I only say “almost” instead of just “nothing” because of various things in quantum mechanics that we don’t yet fully understand). But these types of things don’t show up in baseball, so nothing in baseball is random. There is always a reason. It’s just that the reasons are often much to complicated for us to be able to tell. Most baseball “experts” then will say “it was because he changed his stance”, even though that has nothing to do with the actual reason. Often the actual reason might be that he happened to hit the ball 1/8th of an inch higher on the barrel and it allowed the ball to hang up just enough for the outfielder to catch it or “random” things like this that players have almost zero control over. Which of course in turn gives rise to us calling it “random” for all practical purposes. But in the end there is a always a reason, even if it’s not within our grasp to see in every case.

  57. Ralph_Malph on July 21st, 2009 1:53 pm

    I would think the shift would decrease BABIP only if the batter doesn’t adjust to it.

    If the batter adjusts to it by going the other way some of the time, he should be able to increase his BABIP at the expense of his power. Which would seem to be one of the purposes of the shift — to invite the batter to make an adjustment that will decrease his likelihood of hitting a home run.

    It would be interesting to look at Branyan’s hit distribution over the course of the season to see if he’s going the other way any more than he was earlier in the season. That would be an indication that he’s adjusting to the shift.

    In my limited observation, I’ve seen Branyan lose several hits to the short right fielder but I’ve only seen him go the other way once. To his credit, Griffey did poke a ground ball through the vacant shortstop hole the other night against the shift for an RBI.

  58. Breadbaker on July 21st, 2009 2:33 pm

    In my limited observation, I’ve seen Branyan lose several hits to the short right fielder but I’ve only seen him go the other way once. To his credit, Griffey did poke a ground ball through the vacant shortstop hole the other night against the shift for an RBI.

    Given the relative skillsets of Branyan and Griffey at this point in their respective careers, it might make more sense for Griffey to adjust to the shift that way while Branyan should keep doing what he is doing, even if he loses some singles.

  59. heyoka on July 21st, 2009 5:33 pm

    wow bilbo27, that’s deep.

    However, in the context of baseball performance and what baseball players can reasonably control, I think we can call certain variances random.

    When you put randomness (or the lack thereof) in those mathematical terms you are taking them out of human context (as we are not god-like creatures who are going to understand all the universal forces that lead to “non random” events). Language that occurs between humans carries lots of implications. So, for sake of argument with other humans, let’s just imply that events are perceivably random when we use the word “random”.

  60. heyoka on July 21st, 2009 5:36 pm

    ….“random” things like this that players have almost zero control over. Which of course in turn gives rise to us calling it “random” for all practical purposes.

    Which you already said. I’m an idiot for writing that last post. :)

  61. MKT on July 21st, 2009 7:52 pm

    Some nice explanations of regression to the mean and randomness here. A decent book for further reading (non-technical, no statistics background required) is Nassim Taleb’s “Fooled By Randomness”, which goes over these and the many other ways in which people in everday life mis-use or mis-perceive probability and statistics. He’s become more famous for his “Black Swan” book, which is also good but gets a little more into grand metaphysical speculation (almost inevitable, since that book is about major events which cannot be predicted).

  62. Sidi on July 21st, 2009 10:10 pm

    Almost nothing in the universe is actually random (speaking with a ridiculous amount of background in mathematics and physics). There is always a reason governed by the laws of the universe.

    You’re discounting quantum physics? Because (as I understand most of it) there is a huge element of randomness and pure probability in the universe. I’m not criticizing, my degree is in biochem and I can’t help but believe that the clockwork theory wasn’t invalidated by Heisenberg…he just stated that we aren’t ever going to be able to read the clock without blindly feeling the hands. I personally subscribe to the modified clockwork theory many above have described.

    I find it strange that people can’t understand randomness. Almost everyone has played a sport, or a game, or a video game, or tossed rocks at a stump. I know you want to justify running three racks in pool, or winning five times in a row when you had 60/40 hand odds, or whatever. But when you accept random it really does help once you come back down.

  63. Sidi on July 21st, 2009 10:13 pm

    And I did see the commentary below about quantum physics, but the way it’s being treated now seems to lead to randomness on a much larger scale than electrons just deciding to appear on the other side of the galaxy.

Leave a Reply

You must be logged in to post a comment.