Game 156, Athletics at Mariners: What *Are* the Mariners, Really?

James Paxton vs. Daniel Mengden, 7:10pm

It’s time for one of this blog’s patented, not terribly beloved, meta-analytics posts. If posts on the differing WAR frameworks are your cup of tea, then boil the kettle and grab some cucumber sandwiches. If not, uh, the M’s and A’s are playing a mostly meaningless game tonight, one saved from instant irrelevance by the return of James Paxton.

A few weeks ago, this post by sports statistics Prof Ben Baumer came across my Twitter timeline:

I looked at some of the responses, as it ties in to the big debate about confidence intervals that OpenWAR (Baumer was one of the developers of OpenWAR) and now Baseball Prospectus’ WARP have been leading since at least late 2017. But fundamentally, I skipped over it, because I think we’ll *always* have these big differences between a WAR framework based on RA9 (runs allowed) and one based on FIP (three true outcomes only). Aaron Nola had a ridiculously low ERA and a somewhat higher FIP, so… there you go. Similarly, James Paxton’s going to look much better in a FIP-based system than an RA9-based system, and that holds up, too. In general, things should line up a lot more for position players, and pitchers can have outliers like Nola.

As the M’s season winds down, many fans have talked about the positives they can take from this season. Despite the late collapse, and despite the M’s dicey short- to medium-term prospects, there should be some positives given how many games the M’s just won. And thus, I’ve seen a number of comments about Mitch Haniger and his ascension into a star player. I’ve loved watching Haniger this year, but I guess I’d thought of him as in a tier below the upper-echelon position players in the game, and while that distinction may be semantic, it made me go and see how the different systems rate Mitch. What I found was a distinction about as wide as the Aaron Nola example.

By Fangraphs’ WAR, Mitch grades out as the 28th-most valuable position player, at 4.4 WAR. That’s really good, and it’s driven by an excellent park-adjusted wOBA – a park adjustment that’s perhaps larger than it’s been in a while. But that offensive performance is partially balanced by some negatives on defense. First there’s the good ol’ position adjustment, which dings him for playing in an OF corner (mostly), and then there’s the actual fielding component, which at FG dings him quite a bit, especially for his performance when he’s NOT in an OF corner. What does Baseball Reference have for him? There, Haniger is the *9th-best* position player in the game at 6.3 WAR,* essentially tied with Christian Yelich of Milwaukee, and within a half-win of the Astros’ Alex Bregman. This…this is good company. Defense is a big part of this, as BBREF also dings him for being a right-fielder, but gives him 7 defensive runs. So is this all about defense?

Maybe not. Comparing the “value” tables at BBREF and Fangraphs gives us a very different idea about what Mitch’s batting stats mean for overall value. At the former, combining batting, baserunning, league, park, and position, Haniger comes out with 5.5 offense-based wins above replacement. This is clear, unambiguous star-level play, but even if we throw out defense, there’s still about a half-win or more of difference between the two systems. Maybe that’s within the margin of error (and if we had confidence intervals, we could check that), but these differences can really add up for players and it gets magnified at the team level.

Which team has the best pitching staff? By Fangraphs and Baseball Prospectus, the answer is easy: the Astros, who’ve given up an absurdly low number of runs, and who do FIP-pleasing things like racking up strikeouts. By Baseball-Reference’s measure, it’s…the Phillies. How can this be? Well, I’d argue that it literally cannot be, but to follow the logic train here, the Phillies may have given up nearly exactly the same number of runs as the M’s, but that’s actually heroic work given that they pitch in front of one of the league’s worst defenses in many years. Once you account for opposition, ballpark, and, crucially, that defense, the Phillies are *actually* giving up about a run per game less than an average team would. Contrast that with the Astros, who are giving up 3.3 runs per game, but even an average pitcher would do pretty well in the pitcher-friendly parks of the AL West and with Houston’s defense. The gap’s not nearly as large as it is for Philadelphia’s.

This result implies something about the Phillies position players, and it’s borne out on the offensive side of the ledger: the Phillies position players – as a group – have played at replacement level all year. Rhys Hoskins, whom FG has at 2.5 WAR? Replacement level. If team defense is the weird trick that makes the Phillies pitchers’ completely average runs-allowed look amazing, then it’s got to be accounted for on the position player side, and boy is it ever.

A somewhat similar thing happens with the M’s, whose offense looks better by BBREF’s numbers than it by Fangraphs’. While Fangraphs’ has the M’s pitching staff ranked 10th in MLB, they slip to 14th in BBREF’s rankings. Baseball Prospectus has them 13th, thought they see them similarly in overall value as FG. Meanwhile, BBREF’s got the M’s 3 wins better on offense than FG. The gap isn’t huge, perhaps, but at some point, this Mariners front office is going to have to triage its needs for 2019, and these gaps add up. Essentially, these differences are large enough that, depending on the source, you could argue that the M’s should invest in their offense *first* or you could argue that they need to shore up their pitching staff.

If you work for the M’s (or Phillies), you’ll have your own internal data that can probably shed some light on this, but whatever it is, it will rest on some pretty fundamental questions of value, and THOSE assumptions will drive the output. This extended look at the gaps between the publicly-available sites just highlights how those slight differences in assumptions can drive massive differences in the final computation of value. This is obvious when you compare the distribution of pitching WAR between FG, BBREF and BP. Baseball Prospectus’ new DRA-based pitcher WAR is fascinating to look at, because it doesn’t really line up with either of the previous approaches. Using mixed models, it creates a per-plate-appearances run estimator based on a ton of different variables, from the park to the umpire. One of the issues many in this field have pointed out is that actual runs allowed gives a much wider distribution that many run estimators, like FIP. You can reduce error (and have solid correlations with future runs allowed) by narrowing the distribution; regression toward the mean is great, and it works, but it can sometimes feel like doing that just eliminates the differences between really good and really bad pitchers. Well, DRA isn’t going to have that problem. The gap between the best pitching staff in Fangraphs’ FIP-based WAR (tighter distribution) is 26 wins. In BBREF’s Runs-allowed system, it shoots up to 37 wins, with Miami running out a staff that’s a shocking 8 wins below replacement level. DRA-based WARP ups that distribution even further, at nearly *47* wins between the Astros and Rangers. They’ve got 5 teams with multiple wins below replacement level, with the Rangers coming in at an unfathomable 12.6 wins below what you’d get if you just swapped in the Tacoma Rainiers’ staff. I…I don’t believe that can possibly be true, but it’s nice to see a distribution that doesn’t minimize the gap between the Astros and, say, the Orioles.

However you set up your system, you’ve got to balance the reduction in bias from accounting for park, league, umpire, whatever, with increases in variance/noise. Everyone’s looking at the same basic data: the M’s have scored too few and given up a few more. But what you do with the data is essentially limitless. I just hope the M’s can figure out how to use that data to coax some real improvement out of their young hitters. Failing that, I’d just like them to get a real, meaningful picture of where they stand vis a vis their likely rivals moving forward.

1: Haniger, CF
2: Segura, SS
3: Cano, 1B
4: Cruz, DH
5: Span, LF
6: Seager, 3B
7: Gamel, RF
8: Zunino, C
9: Gordon, 2B
SP: Paxton!

* Baseball Prospectus, by the way, essentially splits the difference. Mitch Haniger isn’t 28th or 9th, he’s…17th. They’ve got Mitch as a plus defender as well, and given their higher replacement level baseline, his 5.5 WARP is closer to BBREF’s than it is to Fangraphs’ 4.4. They agree with Fangraphs on James Paxton, though, whose higher ERA hurts his value at BBREF.


3 Responses to “Game 156, Athletics at Mariners: What *Are* the Mariners, Really?”

  1. don52656 on September 24th, 2018 10:39 pm

    While having to deal with the different methods of measuring value is certainly a challenge, it seems to me that the bigger challenge is trying to project what the value will for the next season. In most of the projection systems I’ve seen, the projected Mariners 2018 leader for WAR was….Kyle Seager, with Mitch Haniger well down the list among Mariners.

    It’s funny, because we’ve heard time and time again that DiPoto/Servais are analysis-friendly, yet I see time and time again Ben Gamel (with lifetime .264/.311/.408 splits against LHP) being pinch hit for by guys who have career worse performance against LHP, like Maybin (.247/.310/.345).

    We’re eliminated, yet Daniel Vogelbach continues to rot on the bench. He’s been in parts of three MLB seasons with the team and has a whopping 26 plate appearances against LHP. Yeah, he’s hit only .087 in those appearances, but this is a guy who has hit .291/.411/.496 in three AAA seasons. He hit .250/.407/.338 against lefties this year in Tacoma, wouldn’t it be nice to get him some experience against LHPs at the major league level?

    As far as I’m concerned, there are simply too many occurrences that appear to demonstrate that the organization isn’t good at recognizing talent levels, much less developing those talents.

  2. heyoka on September 25th, 2018 3:23 am

    If there was a baseball-reference anonymous group, I would be in it. I’m literally less productive at work because of that site.
    Naturally I want their Mitch Haniger rating system to be the correct one……
    But from countless hours of staring at history’s players, I have a feeling their dwar isn’t always correct…..and that’s not just because they do this funky thing where you can’t simply add owar and dwar together.
    A good analysis would be to take a teams war total, add it to 40 and see how it rates vs actual performance, sorta like a war pythag.

  3. 3cardmonty on September 25th, 2018 1:24 pm

    Excellent post Marc, I love digging into the differences between the WARs. Haniger certainly doesn’t seem like a plus defender by the eye test. His reads are kind of terrible? Am I being overly harsh?

    Don brings up an interesting point–one would think that having an agreed upon evaluative system would be a prerequisite to having an accurate projection system. More and more, I think even teams internally have neither.

    Dipoto is clearly a second-division GM. But I have absolute faith that if he were fired, ownership could easily manage to find someone worse to replace him.

    Heyoka, I actually did such an analysis recently when the Mariners’ pythag was looking historically nutso. You can find it in the comments to this post–the comment with an image embedded:

