by Simon Singh
To appreciate James’s concerns, imagine that a batter has hit a ball into the air far from any fielders. A speedy fielder dashes fifty yards, gets to the ball just in time, but fumbles the catch. This is marked down as an error. Later in the game, a sluggish fielder is faced with the same scenario, but he is unable to reach halfway to where the ball lands and has no hope of even attempting a catch. Crucially, this is not marked down as an error, because the fielder did not fumble or drop the ball.
Based on this information alone, which player would you prefer to have on your team? The obvious answer is the faster player, because next time he might make the catch, whereas the slower player will always be too slow to have any chance of doing something useful in this scenario.
However, according to the error statistics, the faster player made an error, while the slower player did not. So, if we were to pick a player based on the error statistics alone, then we would pick the wrong player. This was the sort of statistic that kept James awake at night. It had the potential to give a false impression of a player’s performance.
Of course, James was not the first person to be concerned about the abuse and misuse of statistics. Mark Twain famously popularized the statement: “There are three kinds of lies: lies, damned lies, and statistics.” In a similar vein, the chemist Fred Menger wrote: “If you torture data sufficiently, it will confess to almost anything.” However, James was convinced that statistics could be a great force for good. If only he could identify the right set of statistics and interpret them correctly, he believed he would gain a profound insight into the true nature of baseball.
Each night he would stare at the data, jot down some equations, and test various hypotheses. Eventually, he began to develop a useful statistical framework and he organized his theories into a slim pamphlet titled 1977 Baseball Abstract: Featuring 18 Categories of Statistical Information That You Just Can’t Find Anywhere Else. He advertised it in the Sporting News and was able to sell seventy-five copies.
The sequel, 1978 Baseball Abstract, contained forty thousand statistics and was more successful, selling 250 copies. In his 1979 Baseball Abstract, James explained his motivation for publishing all these statistics: “I am a mechanic with numbers, tinkering with the records of baseball games to see how the machinery of baseball offense works. I do not start with the numbers any more than a mechanic starts with a monkey wrench. I start with the game, with the things that I see there and the things that people say there. And I ask: Is it true? Can you validate it? Can you measure it?”
Year after year, James witnessed a growing readership for his Baseball Abstract, as like-minded number crunchers realized that they had discovered a guru. The novelist and journalist Norman Mailer became a fan, as did the baseball fanatic and actor David Lander, who played Squiggy on the TV show Laverne and Shirley. One of James’s youngest fans was Tim Long, who would go on to join the writing team of The Simpsons, write the script for “MoneyBART,” and feature a copy of one of James’s books alongside Lisa Simpson.
Further Observations About the Murky World of Statistics
“He uses statistics as a drunken man uses a lamppost—for support rather than illumination.”
—ANDREW LANG
“42.7 percent of all statistics are made up on the spot.”
—STEVEN WRIGHT
“Giving a school man only a little, or very superficial, knowledge of statistics is like putting a razor in the hands of a baby.”
—CARTER ALEXANDER
“Then there is the man who drowned crossing a stream with an average depth of six inches.”
—W. I. E. GATES
“I always find that statistics are hard to swallow and impossible to digest. The only one I can ever remember is that if all the people who go to sleep in church were laid end to end they would be a lot more comfortable.”
—MRS. MARTHA TAFT
“The average human has one breast and one testicle.”
—DES MACHALE
While heading to a conference on board a train, three statisticians meet three biologists. The biologists complain about the cost of the train fare, but the statisticians reveal a cost-saving trick. As soon as they hear the inspector’s voice, the statisticians squeeze into the toilet. The inspector knocks on the toilet door and shouts: “Tickets, please!” The statisticians pass a single ticket under the door, and the inspector stamps it and returns it. The biologists are impressed. Two days later, on the return train, the biologists showed the statisticians that they have bought only one ticket, but the statisticians reply: “Well, we have no ticket at all.” Before they can ask any questions, the inspector’s voice is heard in the distance. This time the biologists bundle into the toilet. One of the statisticians secretly follows them, knocks on the toilet door and asks: “Tickets please!” The biologists slip the ticket under the door. The statistician takes the ticket, dashes into another toilet with his colleagues, and waits for the real inspector. The moral of the story is simple: “Don’t use a statistical technique that you don’t understand.”
—ANONYMOUS
According to Long, James was his hero as a teenager: “I loved calculus in high school and I was a baseball fan. My dad and I bonded over baseball. However, baseball was nothing but folk wisdom in terms of how it was managed, so I liked the idea of a guy who came along with numbers to disprove a lot of folk wisdom. I was a huge fan of Bill James when I was fourteen.”
Among James’s most avid followers were mathematicians and computer programmers, who were not only absorbing his discoveries but also developing their own insights. Pete Palmer, for example, was a computer programmer and systems engineer at a radar base in the Aleutian Islands, keeping an eye on the Russians. This was the high-tech equivalent of being the night watchman at a pork and beans factory, and just like James, he would think about baseball stats while he was working late into the night. In fact, he had been fascinated by the subject since childhood, when he had obsessively compiled baseball records on his mother’s typewriter. One of his most important contributions was to develop a new statistic known as the on-base plus slugging percentage (OPS), which encapsulated two of the most desirable qualities in a batter, namely the ability to smash a ball out of the park and the less glamorous knack of being able to get on base.
To give you a sense of how Palmer used mathematics to assess batters, the full-blooded formula for OPS is shown here. The first component of OPS is slugging percentage (SLG), which is simply a player’s total number of bases divided by the number of at-bats. The second component is on-base percentage (OBP), which we will discuss later when we return to “MoneyBART,” because Lisa Simpson refers to OBP when picking her team.
The formula for OPS, which was first popularized in the book The Hidden Game of Baseball, which Palmer co-wrote with baseball historian John Thorn. Please do not feel guilty if you want to skip over this minefield of mathematics and baseball jargon.
Therefore,
OPS
= on-base plus slugging
H
= hits
AB
= at-bats
OBP
= on-base percentage
BB
= base on balls
SF
= sacrifice flies
SLG
= slugging percentage
HBP
= times hit by pitch
TB
= total bases
Like Palmer and James, Richard Cramer was another part-time amateur statistician who would use mathematics to explore baseball. As a researcher with the pharmaceutical company SmithKline, Cramer had access to considerable computing power, which was supposed to be used to help develop new drugs. Instead, Cramer left the computers running overnight in order to tackle questions in baseball, such as whether or not clutch hitters are a real phenomenon. A clutch hitter is a player who has the special ability of excelling when his team is under the most pressure. Typically, the clutch hitter delivers a big hit when his team is on the
verge of losing, particularly in a big game situation. Commentators and pundits have sworn for decades that such players exist, but Cramer decided to check: Do clutch hitters really exist, or are they merely the result of selective recall?
Cramer’s approach was simple, elegant, and entirely mathematical. He would measure players’ performances in ordinary games and in high-pressure situations during a particular season—Cramer chose the 1969 season. A few players did seem to excel at key moments, but was that due to some innate superpower that kicked in when they were under pressure, or was it simply a fluke? The next stage of Cramer’s analysis was to perform the same calculations for the 1970 season; if clutch hitting was a genuine skill possessed by special players, then the clutch hitters in 1969 would surely also be clutch hitters in 1970. On the other hand, if clutch hitting was a fluke, then the supposed clutch hitters of 1969 would be replaced by a new bunch of lucky clutch hitters in 1970. Cramer’s calculations demonstrated that there was no significant relationship between the two sets of clutch hitters across the two seasons. In other words, supposed clutch hitters in one season could not maintain their performance. They were not particularly clutchy, just lucky.
In his 1984 Baseball Abstract, James explained that he was not surprised: “How is it that a player who possesses the reflexes and the batting stroke and the knowledge and the experience to be a .262 hitter in other circumstances magically becomes a .300 hitter when the game is on the line? How does that happen? What is the process? What are the effects? Until we can answer those questions, I see little point in talking about clutch ability.”
Derek Jeter, who is nicknamed “Captain Clutch” thanks to his batting performances with the New York Yankees, vehemently disagreed with the statisticians. In an interview with Sports Illustrated, he said: “You can take those stats guys and throw them out the window.” Unfortunately, Jeter’s own figures supported James’s conclusion. Averaged across thirteen seasons, Jeter’s batting average/on-base percentage/slugging percentage stats were .317/.388/.462 in regular season games, and .309/.377/.469 (marginally worse) in crucial playoff games.
Of course, all new mathematical disciplines need names, and in due course this empirical, objective, and analytical approach to understanding baseball became known as sabermetrics. The term, coined by James, has it root in SABR, the acronym for the Society for American Baseball Research, an organization set up to foster research into all areas of baseball, such as the history of the game, baseball in relation to the arts, and women in baseball. For two decades, the baseball establishment largely ignored and sometimes even mocked James and his growing band of sabermetric colleagues. However, sabermetrics was eventually vindicated, when one team was brave enough to apply it in the most ruthless manner possible and prove that it held the secret to baseball success.
In 1995, the Oakland Athletics baseball team was purchased by Steve Schott and Ken Hofmann, two property developers who made it clear from the outset that the team’s budget had to be slashed. When Billy Beane became general manager in 1997, the Athletics were notorious for having the lowest payroll in Major League Baseball. Without money, it dawned on Beane that his only hope of winning a decent number of games was to rely on statistics. In other words, he would use mathematics to outsmart his wealthier rivals.
A devotee of Bill James, Beane showed his faith in statistics by hiring a stats-obsessed Harvard economics graduate, Paul DePodesta, as his assistant. In turn, DePodesta hired more statistical obsessives, such as Ken Mauriello and Jack Armbruster, a pair of financial analysts who left Wall Street and set up a baseball stats company called Advanced Value Matrix Systems. They analyzed the data from each individual play across hundreds of past games in order to judge the exact contribution of each pitcher, fielder, and hitter. Their algorithms minimized the haphazard influence of luck and effectively placed a dollar figure on every player on every team. This gave Beane the information he needed to acquire undervalued players.
He soon realized that the best bargains appeared on the market at midseason, when teams that were no longer capable of winning their league would cut their losses by selling off players. The law of supply and demand dictated a drop in prices, and Beane was able to use statistics to pinpoint excellent players who had gone unnoticed within struggling teams. Sometimes DePodesta recommended trades or acquisitions that seemed crazy to the traditionalists, but Beane rarely doubted his advice. Indeed, the crazier the deal, the bigger the opportunity to acquire an undervalued player. The power of DePodesta’s mathematics and the resulting midseason deals was already clear by 2001. The Oakland A’s won only 50 percent of their 81 games in the first half of that season; that increased to 77 percent in the second half of the season, and they finished second in the American League West.
This dramatic stats-based improvement was later documented in Moneyball, a book by the journalist Michael Lewis, who followed Beane’s adventures with sabermetrics over the course of several seasons. Of course, the title of the episode of The Simpsons in which Lisa becomes a baseball coach, “MoneyBART,” is based on the title of Lewis’s book. Moreover, in the picture here, the third book below Lisa’s computer is Moneyball. Hence, we can be sure that Lisa is fully aware of Billy Beane and his commitment to implementing sabermetrics in its purest form.
Unfortunately, Beane lost three of his key players to the New York Yankees at the end of the 2001 season. The Yankees could simply afford to sabotage their rivals by buying up the talent; the Yankees’ payroll was $125 million, whereas bargain basement teams like the Oakland A’s were forced to survive on $40 million. Lewis described the situation thus: “Goliath, dissatisfied with his size advantage, has bought David’s sling.”
Hence, the 2002 season got off to a bad start for the Athletics, yet again. However, DePodesta’s computer highlighted some cheap midseason deals that more than compensated for those players lost to the Yankees. In fact, sabermetrics resulted in the Oakland A’s finishing on top of the American League West after completing a remarkable late-season winning streak of twenty games in a row, which broke the American League record. This was the ultimate victory of logic over dogma. Sabermetrics had resulted in arguably the greatest achievement in baseball in modern times.
When Lewis published Moneyball the following year, he admitted that he had occasionally doubted Beane’s reliance on mathematics: “My problem can be simply put: every player is different. Every player must be viewed as a special case. The sample size is always one. [Beane’s] answer is equally simple: baseball players follow similar patterns, and these patterns are etched in the record books. Of course, every so often some player may fail to embrace his statistical destiny, but on a team of twenty-five players the statistical aberrations will tend to cancel each other out.”
Moneyball brought Beane to public attention as the maverick hero who had enough confidence in sabermetrics to challenge baseball’s orthodoxy. He also gained admirers in other sports, such as soccer, as discussed in appendix 1. Even those who were not sports fans became aware of Beane’s success when Hollywood released Moneyball, an Oscar-nominated film based on Lewis’s book, starring Brad Pitt as Billy Beane.
Naturally, Beane’s success persuaded rival teams to adopt Oakland’s approach and hire sabermetricians. The Boston Red Sox hired Bill James prior to the 2003 season, and a year later the father of sabermetrics helped the team win the World Series for the first time in eighty-six years, breaking the so-called Curse of the Bambino. Eventually, full-time sabermetricians were also hired by the Los Angeles Dodgers, New York Yankees, New York Mets, San Diego Padres, St. Louis Cardinals, Washington Nationals, Arizona Diamondbacks, and Cleveland Indians. However, one baseball team has surpassed all these in terms of proving the power of mathematics, namely the Springfield Isotots led by Lisa Simpson.
In “MoneyBART,” when Lisa leaves Moe’s Tavern10 armed with books about mathematics, she is determined to employ statistics to help the Isotots win. Sure enough, she successfully uses spreadsheets, computer simulat
ions, and detailed analysis to transform the Isotots from “cellar dwellers” into the second-best team in the league behind Capital City. However, when Lisa tells Bart not to swing at anything in a game against Shelbyville, he disobeys her instructions . . . and wins the game. According to Lisa, however, Bart’s home run was just a fluke. Indeed, she feels that such insubordination could potentially undermine her statistical strategy and destroy the team’s future hopes. Hence, she throws Bart off the team, because “he thought he was better than the laws of probability.”
Having noted that Nelson Muntz has the highest on-base percentage, Lisa follows the tenets of sabermetrics and makes him the new lead-off hitter, whose most important task is to get on base. Lisa clearly agrees with her fellow sabermetrician Eric Walker, who views the significance of on-base percentage as follows: “Simply yet exactly put, it is the probability that the batter will not make an out. When we state it that way, it becomes, or should become, crystal clear that the most important isolated (one-dimensional) offensive statistic is the on-base percentage. It measures the probability that the batter will not be another step toward the end of the inning.”
Sure enough, thanks to Lisa’s knowledge of on-base percentage, the Isotots continue their winning streak. One commentator declares her success as “a triumph of number crunching over the human spirit.”
The Isotots duly make it to the Little League State Championship, where they play Capital City. Unfortunately, one of her players, Ralph Wiggum, is incapacitated by a juice overdose, so Lisa is forced to ask Bart to return to the team. He accepts the invitation with reluctance, because he knows that he will be faced with a dilemma: Does he follow his instinct or follow Lisa’s mathematically based tactics? With Capital City leading the Isotots 11–10 in the ninth and final inning, Bart again decides to disobey Lisa. This time he makes the final out and the Isotots lose, all because of Bart’s failure to follow the sabermetric gospel.