by Keith Law
Of course, I did have to take sides on this issue in a very public way once, and it put me in the middle of a controversy I really wanted no part of.
In 2009, I voted on a BBWAA postseason award for the first time, and was given a ballot for the NL Cy Young Award, which in that particular year was quite competitive, with incumbent winner Tim Lincecum turning in another great year, but two Cardinals pitchers, Adam Wainwright and Chris Carpenter, also having outstanding seasons, along with Atlanta starter Javier Vazquez and Arizona starter Dan Haren. Those five guys were clearly the top five pitchers in the league, but how you ranked them depended a lot on how you viewed the job of the pitcher and separated that out from the contributions of his defense.
At the time, the Cy Young ballot contained just three spots for names—shortly afterward the BBWAA added two more spots, probably in response to this situation—so I listed Tim Lincecum first, Javier Vazquez second, and Adam Wainwright third. That ordering reflected a defense-neutral way of valuing their contributions, which in that particular year discounted the Cardinals’ pitchers performances a little relative to the other three names. Where it became controversial, however, was around the omission of Carpenter, who led the NL in ERA that year—with only 192 innings pitched, less than any of the other four pitchers I mentioned above and a lower IP total than any starting pitcher who won the Cy Young Award in any full season since the award began in 1956.
If you followed the previous section, though, you can see why that lower workload moved Carpenter off my ballot of the top three pitchers. Pitching is about run prevention, yes, but the more you pitch and prevent runs, the more value you deliver. Carpenter had a lower ERA than Vazquez, but about the same FIP, and Vazquez provided his team with another 27 innings of above-average pitching over what Carpenter gave St. Louis. Since Carpenter’s innings total was so low by historical standards, too, I didn’t think my ballot would garner any notice at all, figuring a lot of voters would omit Carpenter for that reason, while Vazquez, who finished second in the NL in strikeouts behind Lincecum, would grab a few second- and third-place votes.
That . . . is not what happened. I was the only voter to include Vazquez on any ballot at all, and one of only two to leave Carpenter off my ballot entirely. This sparked a lot of silly outrage from a subset of Cardinals fans, many of whom couldn’t do simple math and claimed that I’d somehow cost Wainwright the Cy Young by listing him third (putting him first and Lincecum second on my ballot would not have changed the overall result). It also had the peculiar result of handing Vazquez a $70,000 bonus for finishing fourth in the Cy Young voting, even though he had appeared on just that one ballot. I’ve still never met Vazquez in person but I think I’d let him buy me a cup of coffee at Gustos the next time I’m in San Juan.
As for whether my ballot was “right,” I just don’t know. I think we’ll be debating the separation of pitcher value from defensive contributions for some time to come, and it’s quite possible that I’ll look back on that ballot at some point and wish I’d listed the pitchers in a different order. Or not.
Regardless of whether you’re valuing a hitter or a pitcher, the final piece of the WAR calculation comes from a player’s value in the field. I discussed fielding value in a previous chapter, saying that whatever metric you use, whether it’s a public one like dRS or UZR or a proprietary one developed by a team using MLB’s own data, you’re going to add up the values of the plays the fielder made and that he didn’t make, comparing them to the probability of an average fielder making those plays. A shortstop who makes every play an average shortstop makes and then makes ten extra plays to his left and ten to his right will grade out a few runs above average, because those twenty plays would amount to “singles prevented.” An outfielder could get the same bump in fewer plays, because an outfielder who catches a ball that is rarely caught is probably preventing extra bases.
Once you have all of the components of a player’s value, you can simply add them up and get a total number of runs of value produced or prevented relative to the average player. If you divide that number by the number of runs of value needed to produce one additional win at the team level—it’s usually right around 10, but does vary slightly as offensive levels change—then you can express the player’s total value as Wins Above Average rather than Runs Above Average, but they say the same thing. For position players, we compare their production to the production of other players at the same specific positions, as the expected offensive output for a shortstop or catcher is far below the expected output for a first baseman or a designated hitter.
WAR, however, uses a different baseline, one that trips up a lot of fans and media because it is a novel concept not found outside of baseball. WAR stands for Wins Above Replacement, and rather than using a league-average player for that position, it uses a “replacement-level” player, a calculated level of production equal to what you’d expect a typical player recalled from the minors to be able to provide to a major-league team for the minimum salary. In other words, if the major leaguer whose value you’re examining were to get hurt, and his team had to call up some random triple-A player to fill the spot, you’d expect to get replacement-level production, which would mean a WAR of zero.
This isn’t perfect, and to some extent, it’s widely used because it’s what we have, and it’s what teams use in their own internal metrics. (And perhaps because it’s a lot easier to talk about someone’s “war” than to talk about his “waa.”) When readers attempt to argue with me over the concept of a replacement-level player, I point out that it’s just a baseline, and you can use a different baseline if you wish without getting different results—it won’t change the way you rank players’ production, but will change what their relative values appear to be. We speak of a player who was worth 6 WAR being worth twice as much as the player who was worth 3 WAR, but that’s not necessarily true; if we used the league average as a baseline, those numbers might drop to 4 and 1, respectively.
My real argument on replacement level, despite my misgivings over it, is that you have to pick some baseline and stick with it. The level used by Baseball-Reference and Fangraphs is arbitrary, and it might not be the best choice of baseline, but as long as it’s transparent, then we can work with the results it gives us. Those sites both use a .294 team winning percentage, which translates to about a 48-114 record, as its replacement level, meaning that a team full of replacement-level players would be expected to post a .294 winning percentage over a full season. (So, yes, the 2003 Detroit Tigers, who went 43-119 and were by far the worst major-league team I’ve ever seen myself, were worse than replacement level.) A team with Mike Trout, who has averaged just about 10 WAR per full season in the majors, and 24 replacement-level players would be expected to go 58-104.
Adding up the WAR numbers for all players on a team won’t always match a team’s win total (plus the 48 wins as a baseline), but it should get us close in most cases and over a large enough sample we’d expect them to coincide. The performance of a single team in a single season can be affected by performances in high-leverage situations, by managerial tactics (for better or for worse), strength of schedule, and so on, but if you look at hundreds of team-seasons, you’d expect their WAR totals to approach their win totals over 48—or else you need to reexamine the weights you’re using for batting, pitching, or fielding runs.
Offense, defense, baserunning, run prevention—that’s everything that goes into WAR, so when I say that WAR isn’t a stat, but a construct, I hope now you can see the difference. WAR is just a way of adding up the various measurable components of a player’s performance, setting a standard baseline, and using it to compare player value across seasons, leagues, even across eras. When writers mock WAR (and do so with headlines that should prompt Edwin Starr to sue for defamation), they think they’re mocking a single stat when they’re actually mocking the idea of valuing players at all. If you don’t think you can value a player, I guess that’s fine, but the entire industry has decid
ed that it disagrees.
For a team’s general manager or president—don’t get me started on baseball title creep—to decide how much to pay a player, or how much to give up in trade for a player, he has to know what the player’s past production has been worth, and will probably ask his analytics department to provide a projection for what that player’s future production is likely to be as well. Each team will value production slightly differently, especially for the less definitive areas of valuation like fielding or pitching, but you can’t begin the conversation about whether to pay Joey Bagodonuts $12 million a year or $14 million a year unless you at least know what his production last year was worth, or the year before that, or what your analysts say he’s likely to be worth next year.
General managers face two major constraints when it comes to signing or acquiring major-league players: money and roster spots. Each team has its own payroll budget, a function of its market size, TV rights fees, and owner wealth. And each team has a 25-man active roster for major-league players, with a 40-man roster with those 25 players plus those on the major-league disabled list and minor leaguers with enough experience to require their inclusion on it. That means every decision about players—signing one, releasing one, trading one or several for one or several, and so on—occurs in the context of what else you could do with that money and that roster spot. (The same is true, with different constraints, in the draft.) If you, the GM, want to make these decisions in a rational way, you need to have a system that puts a single value on a player’s past production or on a player’s projected future production. You may call the system whatever you want, but it will end up as Runs or Wins Above Some Baseline. Without such an objective system, how could you decide which player to add or drop, or where to spend your limited resources or roster spots?
Teams operated in an informational vacuum for nearly all of MLB history, including the first thirty years of free agency, and the result was unsurprising—free agents were frequently mispriced and were largely paid for past performance they weren’t likely to repeat. Today’s MLB front offices aren’t just using the WAR figures you find on public websites, they are calculating their own total values for players via proprietary formulas and using the results to make better decisions. Mocking the mere concept only serves to separate the Luddites in the media and audience from those who believe they should at least understand how front offices operate before they think of critiquing those decisions.
This is how the industry works today, and if you don’t like it, build a better formula yourself.
PART THREE
Smarter Baseball
15
Applied Math:
Looking at Hall of Fame Elections Using Newer Stats
The 2016 World Series may have been an inflection point in the history of baseball analysis, as two of the most overtly analytically focused front offices met in a series that saw both clubs dispense with much conventional wisdom on player usage—especially in how they deployed their pitchers—and led to the end of the longest championship drought in American professional sports.
The Chicago Cubs, who defeated the Cleveland Indians in seven games, had not even appeared in the World Series since 1945 and hadn’t won the title since 1908, but ended both streaks thanks in part to the adoption of various analytical tools by their front office and field coaching staff. Led by President of Baseball Operations Theo Epstein, who had previously led the Red Sox to two World Series titles (including one that ended their own eighty-six-year drought) as their general manager, the Cubs overhauled their organization to build a top-flight analytics department, hired the analytically minded Joe Maddon as their field manager, and changed the team’s entire philosophy on drafting and acquiring players. They became the majors’ best defensive outfit in large part due to their use of advanced data to determine where to position fielders for each batter, recording arguably the best team performance in converting balls in play into outs (relative to their league) since World War II.
In the postseason, the impact of statistical analysis and just the new information that has become available to MLB teams in the last few years was even more stark. Not only have these defensive shifts become commonplace, but both managers, Maddon and Cleveland manager Terry Francona, treated the pitcher win and the save like the fetid anachronisms they are, pulling starters before they’d reached the five-inning threshold and using their best relievers too early for save situations. Cleveland entered the postseason with two of their three best starters on the disabled list, yet came within a few outs of winning its first World Series since 1948, thanks in no small part to Francona’s tactical maneuvers and the performance of their best reliever, Andrew Miller. We’d seen glimpses of this kind of managing previously—in 2014, the Giants won game seven of the World Series when starter Madison Bumgarner came out of the pen on short rest to shut the Royals down—but never to the extent we saw in the 2016 postseason. The gap between managers who used new information and insight and those who didn’t was obvious even to casual fans. The game itself may not have changed, but the way it’s played and managed has changed forever.
The statistical revolution that started around the time of the publication of Moneyball has altered the fabric of the game at a permanent and fundamental level. In the 2016–17 offseason, the last two significant holdouts to embracing statistical analysis as a core part of their baseball operations departments, the Twins and the Diamondbacks, began building such capabilities. There is no returning to the days where gut feelings and guesswork ruled baseball decision-making, and the revolution has only made teams thirsty for more data.
While all of these statistical advances, as well as those on the horizon, will continue to reshape the game for years, their effects will be felt in very different ways at all levels—from the scouts in the field looking for talent, to the players in the actual games, to the way that writers (God help us) vote on their Hall of Fame ballots. Taking that last point first, I’ll spend some time as most baseball fans online do: using modern statistical tools to engage in long and utterly futile debates over who belongs in the Hall of Fame.
WAR, however you choose to calculate it, is the right tool for a job like evaluating a player’s career. The Baseball Writers’ Association of America votes each winter on which players to elect to the Baseball Hall of Fame in Cooperstown, New York, but until very recently such votes were based either on emotion and gut feeling or on the kind of incomplete or downright misleading traditional stats I discussed in Part One. The guidelines for Hall of Fame voters are vague, and there are long, ongoing discussions even among serious voters and other writers about what it means to be a Hall of Famer. Are we talking about the player’s value at his peak? His longevity? Consistency? A little of everything?
If nothing else, though, we can talk about some of the more egregious mistakes in BBWAA voting history, and even look at some of the current debates around whether there’s any place at all for a modern closer in Cooperstown, using these advanced stats, including WAR, to provide the debate with an objective foundation.
Lou Whitaker was one of the best infielders of the 1980s, and at the time of his peak, he and shortstop Alan Trammell were widely presumed to be on track for the Hall of Fame—although such presumptions are often built on the belief that the players won’t decline quickly in their thirties. (Just ask fans of Dale Murphy, who was about as clear a Hall of Famer through age thirty as you’ll find, but was finished as an average regular by age thirty-two.) At the time of Whitaker’s lone appearance on the Hall ballot in the 2000–01 offseason, he ranked fourth all-time in home runs by a second baseman, behind Hall of Famer Joe Morgan and future Hall of Famers Ryne Sandberg (elected in 2005 on his third ballot) and Joe Gordon (named by the Veterans Committee in 2009).*
Whitaker also ranked fifth at that time in doubles by a second baseman, behind four Hall of Famers, with 420. I don’t care for RBI as a measure of hitter performance, but voters do—and did even more in 2000 than they do today—an
d Whitaker was fifth all-time among second basemen in that category as well, and fourth in runs scored, and fourth in walks drawn, and sixth in hits. Among second basemen with at least 3,000 plate appearances in the majors, he ranked 20th in batting average at .276, 12th in OBP at .363, and 8th in slugging at .426. These are all stats you could have found on the back of Whitaker’s baseball card, and other than his batting average the rest all make a strong case for his election. He was one of the five greatest second basemen in history when he first appeared on the ballot.
That first appearance was also his last, unfortunately. The 2001 Hall of Fame class included two players inducted in their first year of eligibility: Dave Winfield, who appeared on 84.5 percent of the 515 ballots cast, and Kirby Puckett, who appeared on 82.1 percent. Whitaker appeared on just 15 ballots, 2.9 percent of the total, even though he had a better career than Puckett or Winfield. Because he failed to meet the 5 percent minimum to stay on the ballot, Whitaker’s name did not appear again.
In fact, Whitaker appeared on fewer ballots than pitcher Dave Stewart (38 ballots), whose 3.95 career ERA would be the highest of any pitcher in the Hall; Davey Concepcion (74 ballots), whose .322 OBP would be the fifth worst in the Hall and whose .357 slugging percentage would be the fifth worst from the live-ball era; and Jack Morris (101 ballots), whose 3.90 career ERA would also have been the highest in the Hall . . . and who was a huge beneficiary of the defense provided by Whitaker and Trammell behind him.