by Keith Law
Yes, but weren’t we talking about at bats a moment ago? What’s this with plate appearances? Indeed, that bait-and-switch exposes batting average’s first major flaw. Batting average doesn’t tell you how often a player gets a hit, but how often he gets a hit ignoring times he draws a walk, gets hit by a pitch, hits a sacrifice fly, makes a successful sacrifice bunt, or reaches via catcher’s interference. Those scenarios don’t count as at bats, but do count as plate appearances. (The first three count for the purposes of on-base percentage, a stat so valuable it will get its own chapter later in the book.)
So why does batting average ignore all of these other events, which in some extreme cases can account for more than a third of a player’s trips to the plate? (Barry Bonds did this twice, in 2002 and 2004, the only MLB player in history whose plate appearances were more than 50 percent higher than his at bats.) Because . . . well, there’s no really good explanation for this. I mentioned above the most likely theory, that when Chadwick created the stat, those other events were rare or just weren’t considered the result of a hitter’s skill or effort, so he chose to omit them. This alone should tell you why using batting average by itself, or even just as your primary metric, to evaluate a hitter leaves out far too much crucial information. Leaving walks drawn, an important skill for a hitter, out of the numerator (just hits) and denominator (at bats), only gives you a portion of the hitter’s season.
The sins of batting average, though, are not just of omission. The numerator is even more flawed than you’d think, because it treats all hits as equal—a single and a home run both carry the same weight in batting average, even though we know they carry substantially different weights in the game.
So what does batting average really tell us about what a hitter did over some period of time? It tells us how often he got a hit in trips to the plate where he didn’t walk or get hit by a pitch or hit a sac fly or bunt or have some other very rare thing that isn’t actually an at bat happen, and it only tells us that he got a hit but not what kind of hit. (Hence the old baseball axiom, often heard after a weakly hit infield single, “It’ll look like a line drive in the box score.”) It’s a bad tradition, but it has stuck with us for well over a century and still carries undue importance in discussions and evaluations of hitters, especially those who lead the league in batting average because we say they “won” something. It’s often confusing because hitters who hit for high batting averages are generally good hitters, period; we’re not getting totally false information from the stat, but we’re misled by its false precision, acting as if going to the third decimal place is a summary judgment on the player. To see the full extent of the flaws in batting average, it helps to compare it to stats that are better equipped.
One basic statistical tool I’ll use often in this book is correlation analysis, where I compare two columns of data to each other and get a number between 0 and 1 that tells us how strongly correlated the two are—that is, how much the two columns move together, 0 meaning no correlation at all, 1 meaning perfect correlation. The higher the number, the greater the correlation between the two stats, meaning that when stat A moves, stat B moves more with it. This does not mean A causes B or B causes A; you’ve probably heard someone say “correlation does not prove causation” at some point, because all a correlation analysis can tell is whether two statistics appear to be related. It could be a direct cause and effect, and it could be coincidence, but this tool only tells us to what extent the two numbers move together. In this book, I will often refer to a correlation between two statistics by saying that one “predicts” the other.
In the table below, I used MLB team stats from the five seasons from 2011 through 2015 to show the correlations between four commonly used hitter-rate stats at the team level and those teams’ runs-scored-per-game figures:
Team Stat
Correlation to Team R/G
Batting average
0.749
On-base percentage
0.833
Slugging percentage
0.903
OPS
0.936
Batting average correlates pretty well to runs scored, at about 75 percent—while this doesn’t show causation, it stands to reason that if a team as a whole is getting more hits during its (arbitrarily, narrowly defined) at bats, it will score more runs. But batting average fares worse compared to the two other common rate stats used for evaluating hitters: on-base percentage and slugging percentage. On-base percentage, or OBP, does just what it claims to do, taking the times a hitter reached base safely, dividing it by all plate appearances other than sacrifice bunts and times reached on interference, and giving the frequency with which the hitter gets on base. A hitter with a .400 OBP, which would put him above the league leaders, reached base in 40 percent of those plate appearances, meaning he made an out of some sort in the other 60 percent. Of all basic batting stats—those you might find on the back of a baseball card or on the stats you find in a game program—OBP is probably the most important for telling you about a hitter’s ability to produce.
Slugging percentage is calculated like batting average but no longer treats all hits as equal. The denominator (the bottom of the fraction) remains at bats, but the numerator changes from hits to total bases. A single counts for one total base, a double two, a triple three, and a home run four. This isn’t an accurate reflection of their relative values; a home run isn’t worth four times as much to an offense as a single, but something like twice as much. It does create some needed separation between hit types, however, and you can see that it correlates extremely well to runs scored at the team level. If you hit for more power, you’re going to score more runs. (In fact, home runs per plate appearance all by itself has a coefficient of correlation of 0.623 with runs scored over this same sample—ignoring absolutely everything else a team does, home runs still drive a substantial fraction of run-scoring.)
OPS, which stands for “On-base Plus Slugging,” is a kludge stat, a brute-force addition of OBP and slugging percentage that is deeply flawed at a basic math level, yet it has gained momentum in popular discussions of the sport, including media coverage, because it kind of works: you can see it correlates better with run-scoring than either OBP or slugging do individually. OPS is popular and problematic enough to merit its own section later in the book, but for now, its purpose here is to show how much information is missing from batting average. If these other rate stats correlate better to team run-scoring, and they’re all easily available at the individual player level, what, exactly, is batting average actually good for?
Despite the deficiencies of batting average, the “batting champion” tag still matters quite a bit within baseball circles, especially where fans and the media are involved. The title features prominently on several Hall of Fame plaques, including the three players cited at the top of the chapter, and becomes a talking point in Hall of Fame elections, but perhaps most important, it’s a primary focus for postseason award balloting and often used as a justification for voting for players who were not in fact the best hitters in the league.
In 2007, Detroit Tigers outfielder Magglio Ordoñez led the American League in batting average at .363, but he wasn’t the best offensive player in the league because he didn’t do enough besides hitting for average. The best offensive player in the league was Alex Rodriguez of the New York Yankees, who led the AL with 54 homers and a .645 slugging percentage, so while he “only” hit .314, he produced more total value with the bat. He had 26 more home runs than Ordoñez and drew 19 more walks, so the total value of all of his contributions—considering the values of all those hits, walks, and extra bases, compared to the number of outs he produced—exceeded that of Ordoñez, even before we consider things like defense. Rodriguez did win the AL MVP award that year, although two Detroit-based writers, Tom Gage and Jim Hawkins, made the absolutely-not-biased-at-all decision to list Ordoñez, the local player, first on their ballots. Gage specifically cited Ordoñez’s batting average in defending his vote, dismissin
g home runs as “a glamour stat.”
Similarly, the Miami Marlins’ Dee Gordon led the National League in batting average in 2015 at .333, but the statistics site Baseball-Reference.com doesn’t list Gordon among the top ten in the National League in “Adjusted Batting Runs” (ABR) an advanced metric that does just what I described above: assigns weights to different offensive events and adds ’em up. Bryce Harper led the NL in just about everything else, winning the NL MVP award unanimously. (Gage and Hawkins are no longer active award voters, and wouldn’t have voted on a National League award as members of the Detroit chapter.) You can see below just how large the gap between Gordon and Harper was, even though Gordon led Harper in batting average:
Harper got on base more, hit for far more power, and made 75 fewer outs. Gordon’s .003 advantage in batting average turns out to be not just meaningless but outright misleading: these two players were nowhere close to each other in offensive production, so exactly what good is batting average doing for us?
In 1991, Barry Bonds was the best player in the National League by a wide margin, and should have walked away with his second straight NL MVP award. He led the NL in on-base percentage that season, ranked fourth in slugging percentage, and even finished second in runs batted in, a statistic that at the time was a major criterion for MVP voters. But Bonds lost the award to Atlanta’s Terry Pendleton, whose primary achievement that year was leading the league in batting average at .319. Bonds was by far the more valuable hitter; he reached base 29 more times than Pendleton did, despite having 10 fewer plate appearances. They hit for almost identical slugging percentages. Bonds had 3 more homers and stole 33 more bases. Both were excellent defenders. Pendleton didn’t do anything better than Bonds except hit for average, but that was enough to carry him to the MVP award by a slim margin, as he received 12 of the 24 first-place votes as opposed to 10 for Bonds. Had the writers gotten it right, Bonds would have won four straight MVPs from 1990 to 1993, something no player had done before, and no player would do until Bonds himself did it in 2001–04.
Some of the statistics I discuss in Part One are quite useless for evaluating a player’s performance or the value he delivered to his team. Batting average isn’t useless, but it does not do what it has long been supposed to do. It doesn’t tell us how good a hitter Joey Bagodonuts is. It doesn’t let us compare one hitter to another and say one is better than the other at anything specific. It doesn’t tell us that someone is a better hitter for contact or for power, or better at getting on base. Whatever batting average does give us, we can get that same information, and more, from other, equally simple metrics.
So if the appeal of batting average as the lord of hitting stats isn’t accuracy, or ease of calculation, then what is it? In many ways, the adherence to batting average isn’t easy to explain, because it just isn’t that logical. Batting average is emblematic of how the weight of baseball history can be the largest impediment to success on the field. The emphasis on batting average when smarter stats are out there embodies the false dichotomy we’ve seen in baseball coverage over the last fifteen years, whether it’s pitched as “scouts versus stats” or traditional versus modern: the writers and fans who profess to disdain statistical analysis in fact rely very heavily on their own statistics—the ones they’ve used their whole lives. These statistics, like batting average, pitcher wins, and others I’ll cover, are simpler to calculate or count, but they give us an incomplete or sometimes plain inaccurate picture of what a player did to help his team. Yet because they’ve been around forever, many fans don’t want to let them go.
There’s no such respect for tradition here: if the old stats don’t work, throw them out and get new ones. But first we have to take the rest of the trash out.
2
Pitcher Wins:
One Guy Gets the Credit for Everyone Else’s Work
“In pitching, the only thing that really matters is wins.”
—headline of Paul Hoynes’s Rant of the Week, Cleveland Plain Dealer, September 11, 2010
For as long as you’ve been a baseball fan, you’ve been inundated with the message that a pitcher—or at least a starting pitcher—is his win total. So-and-so is a 20-game winner. What’s-his-face has a low ERA but “only” went 12-13. Until fairly recently, other metrics that might give us some sort of indication of how well he actually pitched paled in comparison to the mighty won-lost record; preventing runs wasn’t enough, but somehow, the pitcher had to will his teammates to score more runs while he was in the game while simultaneously exhorting his relievers to pitch well after his departure.
This line of thinking, of course, is dumber than a sack of hair. In baseball, team victories matter, but the idea of a single player earning full credit for a win or blame for a loss exposes a deep ignorance of how the game actually plays out on the field. If you’ve ever watched an actual game of baseball, you know that the sport doesn’t function this way: even a pitcher who throws a perfect game gets some help somewhere—from his defense, from his catcher, and of course from the offense that scored at least one run so he didn’t have to go out and pitch the tenth inning—which happened to Pedro Martinez in 1995 while he was still a Montreal Expo. Pedro threw nine perfect innings against the Padres, but the Expos couldn’t push a run across until the tenth inning; only after that did he qualify for the win despite retiring all 27 batters he’d faced to that point. As the pitcher, Martinez couldn’t have done any more to help his team win the game, but he didn’t “earn” the victory until his teammates scored. This is because the entire thought process that led us to this point, where a starting pitcher gets that credit or blame, is both out of date and very, very stupid.
Once upon a time, when men were men who ate giant hunks of raw meat for sustenance, the job of the starting pitcher was vastly different from what it is today. Starters in the late nineteenth century and well into the early twentieth century typically threw complete games, and might pitch every third day or, in some extreme examples from premodern baseball, every other day. (Check out Old Hoss Radbourn’s career stats for some seasons that look like they came from a different sport entirely—which, in practical terms, they did.) Relievers would enter the game to clean up only after a starting pitcher had struggled and the game’s outcome was probably already determined.
Before 1920, offensive levels were so low in general that we now refer to that time in baseball’s history as the “dead ball” era; any hitter who reached double digits in home runs would likely lead his league for that season, so for a pitcher to complete a game was a less arduous task than it would be even ten years after that era ended. The ball wasn’t really dead, but hitters were taught to put the ball in play and often were satisfied with hitting the ball on the ground, or, as Wee Willie Keeler supposedly said, to “hit it where they [the fielders] ain’t.” That means swinging early in the count and keeping the ball in the park, so the idea of a pitch count—a tally of how many pitches a starting pitcher has thrown to that point in the game—would have struck even coaches or executives of the time as meaningless. (There are still the occasional troglodyte comments from coaches and ex-pitchers about how today’s starters are “babied” with pitch counts that, while the direct relationship is not precisely known, exist to try to keep pitchers healthy. Better we just run young arms right into the ground, I suppose.)
After 1920, offensive levels in baseball changed, spurred in part by the rise of Babe Ruth, who had seasons where he would out-homer entire opposing teams by himself before the league started to catch on and both acquire more power hitters and teach hitters to try to drive the ball. Yet the job of the starter remained essentially the same until the late 1940s and the 1950s, when we began to see the ancestors of today’s modern relievers, pitchers who have been retroactively credited with saves and appear to the modern observer as “closers.” (The save, a terrible statistic in its own right, gets its own two minutes’ hate in a later chapter.) Baseball teams had also settled into the four-man rotation, which woul
d last into the 1970s even though pitchers’ careers frequently ended in that time period due to injuries that, from our modern perspective, appear to be related to overuse.
Today’s pitching staff usage bears little resemblance to the patterns of a century ago. Starters are rarely asked to turn over a lineup—that is, to face opposing hitters—a fourth time and sometimes only have to turn it over twice in one start before the manager makes the call to the bullpen. We live now in an era of pitch counts, where 100 is seen as a magic number (because people have ten fingers, making 100 the pretty round number), and 120 is the top end of what a major-league starter might be asked to do. Pitchers work in five-man rotations, almost never throwing on short rest in the regular season, and skip starts or hit the disabled list at the first sign of trouble in their elbows or shoulders. It may be this new paradigm, rather than a recognition that the pitcher win is the homunculus of baseball stats, that finally kills this number once and for all.
In 1904, New York Highlanders pitcher Jack Chesbro started 51 games, completed 48 of them, threw 454 innings, and was credited with 41 wins on the season. Since the start of the modern, two-league era in 1901, no other starting pitcher has “won” more than 40 games in a season. Entering the 2017 season, there hasn’t even been a pair of teammates anywhere in baseball who combined for 40 pitcher wins in a single season since 2002. These pitcher wins are still going somewhere—the accounting rules of baseball require someone to get a win, even if nobody pitched particularly well—but now they’re going to relievers who might do a fraction of the work of the starter and sometimes are merely the pitcher of record at the time that his team happened to score. A reliever is said to have “vultured” a win if he entered the game with his team in the lead, gave up the tying run (or worse), and then was still the active pitcher when his team retook the lead that they kept till the end of the game. The game on the field hasn’t changed at all, but the methods of accounting that were developed, arbitrarily, more than a century ago are no longer capable of describing what happened on the field in any meaningful way.