# This week’s PI bit: RoboIchiro 2000

DMZ · June 15, 2005 at 10:47 pm · Filed Under Mariners

My piece in the PI this week, (“Is something wrong with Ichiro“) deals with a topic that came up here earlier (in Dave’s post “What’s wrong with Ichiro“).

It goes in a direction I started to get into a little in the comments — and features a lot about how once you get into smaller and smaller samples, the farther wrong you can go, and features RoboIchiro 2000.

### Comments

**19 Responses to “This week’s PI bit: RoboIchiro 2000”**

Great article.

We talk about averages – batting, OBP, etc. – but rarely talk about variance (deviation in stat talk.)

But having wildly variant hitters is how you get the “big inning” so it’s not so bad.

Wouldn’t it make sense that Home Run hitters would be more variation-prone while singles hitters like Ichiro would not? Or is it roughly equal?

I found this article over on BTF:

http://www.baseballthinkfactory.org/btf/scholars/ruane/articles/consistent_hitters.htm

The most inconsistent hitters were Babe Ruth, Roy Campanella, and George Brett. All guys with power.

The players who appear multiple times on the consistent list include Stuffy McInnis, George McBride, Doc Cramer, Graig Nettles and Pete Rose. Nettles had power; McBride, Cramer and McInnis were mainly singles hitters, and Rose was Rose.

I’m curious, DMZ… in your 10 simulated seasons, did Ichiro ever have 2 months in a row of .275 or worse hitting? How about a few months in a row of .400 or better? In the past 4 years, he has done both these things (the former, twice).

Nice – now if only it appeared in print. Any possibility of that in the future?

Since Dave’s post a couple of days ago I have been studying Ichiro’s month-to-month variance (Vmtm), and comparing it to Johnny Damon’s (the first player who came to mind comparable in that he’s been a leadoff hitter for the same club for the 20 months in Dave’s sample). Some preliminary conclusions:

1. Ichiro is roughly twice as variable as Damon (std dev Ichiro=.066, Damon=.036).

2. Damon’s Vmtm corresponds pretty well to the classic bell curve, while Ichiro’s appears to be bimodal (a high peak and a low peak).

3. Ichiro’s worst third corresponds well to Damon’s worse half, his best third to Damon’s better half, and his top third to, well, no one else in the game. “Ichiro — way better than Johnny Damon 1/3 of the time.” ðŸ™‚

The numbers beg the question, is there a reason that Ichiro would be more streaky than Johnny Damon? I propose a few theories:

1. Ichiro is uncomfortable with and/or resistant to making adjustments to his swing and approach at the plate.

2. Ichiro’s swing and approach at the plate is so unorthodox that a hitting coach is of little help.

3. Ichiro’s swing and approach at the plate is just inherently hard to make adjustments to.

Of course, there could be other players as streaky as Ichiro. If anyone can offer a comparable (leadoff hitter for the same club since 2001) I’d be glad to run the numbers.

#6 – What’s your mathematical/statistical background/occupation? Can’t help but notice your level of knowledge (don’t often see stuff about Black-Scholes on sports websites…).

I’m a finance guy who paid attention in algebra, calculus and 300- and 400-level biometrics and econometrics at WSU. BTW, I’m not the one who mentioned Black-Scholes, which I don’t understand other than it’s used to value out-of-the-money stock options. Mentioning Black-Scholes makes for good snark but I don’t see what on the field we’d apply it to.

Another question would be, is the batting average of a player whose value is more heavily invested in his batting avaerage more variable than the batting average of a player who also has other offenseive skills?

It could well be that players who draw more walks or hit for more power have less variable batting averages, even if the batting averages are similar.

Occurrence of 20 monthly batting averages in quartile ranges based on probability of occurrence at random:

Random hitter: 5-5-5-5

Johnny Damon: 3-6-7-4

Ichiro: 8-2-3-7

Brian … what are you using for AB/month?

Oops … my point is that if there is no variability in your number of AB/month then the resulting variability in monthly average will be biased on the low side.

I am using the ACTUAL number of ABs for each month. The probability for each month is BASED on the sample size.

Thanks …

I agree that his monthly variabilty (for 20 months) is uncommonly high. In my simulations (10000 Ichiro careers, 992 randomly ordered hits 2934 AB’s) ONLY ONCE did I achieve a monthly variability as high as 0.066 (standard deviation normalized by N).

However, I do not have the AB totals for 2001 … thus I cheated …

I think this article is very clear on describing the random flucations we will see in BA. It incorporates principles of random variation, but also mentions the human factor of slumps.

I did a bit of debating the other day with Dave’s piece, but I must say I agree with everything here. This was very well written, and very easy to understand. Very well done.

Tim … he did address random fluctuations, but did not take the next logical step and TEST the hypothesis that Ichiro’s monthly batting average variability results from “random fluctuations”.

This is what Brian Rust is doing and what I did with my simulation.

Ichiro has an unusually high month-to-month variability in his batting averages when compared to a truly random hitter (eg, computer simulation), high enough to reject the hypotheses that it is owing to “random fluctuations.”

The variability of apparent batting average for, say, 40 at-bats (considerably less than a month, but about equal to the memory time horizon of many sportswriters) can be tremendous. A lifetime .300 hitter, who is simply hitting like a computer automaton, has over 11% chance of hitting .400 or better (16 or more hits), and over 11% chance of hitting .200 or worse (8 or fewer hits).

No simulations needed here– these chances are calculated straight from a binomial distribution.

#2 and #9 — under the “it’s just random fluctuations, no real streaks or slumps” assumption, any two hitters with the same average will have the same spread of variation, given the same number of ABs (not PAs). That’s whether they get that average by hitting infield singles, or by pure Ks and HRs.

This is because what “just random fluctuations” is saying is that each AB has the same underlying probability of a hit, and no memory of what happened in the last AB. The hitter is being modeled as a 1000-sided die with 300 faces saying “hit”.

So Ichiro’s singles-hitting approach doesn’t directly come into play in the discussion of whether the “just random fluctuations” hypothesis fits.

My mention of Black-Scholes was GREAT snark, you have to admit (my finance MBA has to be worth something), and as an aside complex option pricing models have made a certain number of math majors good money on Wall Street, although I recommend using OPM as the best method to avoid personal disasters. But it is merely an esoteric use of statistical probabilities, and sabremetrics applied to baseball is simply another use of the same. Actually this discussion of Ichiro is a great example of the substantial limitations of statistical analyses because of its’ reliance on past behavior and thus the ability to extrapolate such into the future. Think of how investing in the OTC market beginning in late 1999 prepared you for the market beginning in March of 2001. Holy Caruthers. The expectation regarding Ichiro is that it is merely his “random” fluctation that is affecting his last couple of months, and early Ichiro will return any day now. Given his unique (and wondrous) batting style, we have no idea. Merely hope. And I still don’t like using monthly averages as the point of reference. That old Chuck Knoxism keeps popping up in my head: It’s not who you play it’s when you play ’em.