by Nate Silver
Pedroia turned to enter the dugout, where he sat all by himself. This seemed like the perfect moment to catch him, so I mustered up my courage.
“Hey, Dustin, ya got a minute?”
Pedroia stared at me suspiciously for a couple of seconds, and then declared—in as condescending a manner as possible, every syllable spaced out for emphasis: “No. I don’t. I’m trying to get ready for the big-league-base-ball-game.”
I hung around the field for a few minutes trying to recover my dignity before ambling up to the press box to watch the game.
The next day, after my credential had expired and I’d headed back to New York, I sent my friend David Laurila, a former colleague of mine at Baseball Prospectus and a veteran interviewer, on a reconnaissance mission to see if he could get something more useful out of Pedroia. But Pedroia wasn’t much more talkative, giving Laurila about the blandest quote imaginable. “You know what? I’m a guy who doesn’t care about numbers and stats,” he told Laurila. “All I care about is W’s and L’s. I care about wins and losses. Nothing else matters to me.”
Pedroia had learned to speak in this kind of cliché after getting himself into all kinds of trouble when straying from the party line. Like the time he called his hometown of Woodland, California, a “dump.”7 “You can quote me on that,” Pedroia told Boston magazine, “I don’t give a shit.”
He doesn’t give a shit. I would come to realize that without that attitude, Pedroia might have let the scouting reports go to his head and never have made the big leagues.
Building a Baseball Forecasting System
I have been a fan of baseball—and baseball statistics—for as long as I can remember. My hometown team, the Detroit Tigers, won the World Series in 1984 when I was six. As an annoying little math prodigy, I was attracted to all the numbers in the game, buying my first baseball card at seven, reading my first Elias Baseball Analyst at ten, and creating my own statistic at twelve. (It somewhat implausibly concluded that the obscure Red Sox infielder Tim Naehring was one of the best players in the game.)
My interest peaked, however, in 2002. At the time Michael Lewis was busy writing Moneyball, the soon-to-be national bestseller that chronicled the rise of the Oakland Athletics and their statistically savvy general manager Billy Beane. Bill James, who twenty-five years earlier had ushered in the Sabermetric era* by publishing a book called The Bill James Baseball Abstract, was soon to be hired as a consultant by the Red Sox. An unhealthy obsession with baseball statistics suddenly seemed like it could be more than just a hobby—and as it happened, I was looking for a new job.
Two years out of college, I was living in Chicago and working as something called a transfer pricing consultant for the accounting firm KPMG. The job wasn’t so bad. My bosses and coworkers were friendly and professional. The pay was honest and I felt secure.
But telling a company how to set prices at its cell phone factory in Malaysia so as to minimize its tax exposure, or hopping a 6 A.M. flight to St. Louis to value contracts for a coal company, was not exactly my idea of stimulating work. It was all too riskless, too prudent, and too routine for a restless twenty-four-year-old, and I was as bored as I’d ever been. The one advantage, however, was that I had a lot of extra time on my hands. So in my empty hours I started building a colorful spreadsheet full of baseball statistics that would later become the basis for PECOTA.
While in college, I had started reading an annual publication called Baseball Prospectus. The series was founded in 1996 by Gary Huckabay, an ebullient and sarcastic redhead who had recruited a team of writers from the early Internet newsgroup rec.sport.baseball, then at the vanguard of the statistical analysis of the sport. Huckabay had sensed a market opportunity: Bill James had stopped publishing his Abstracts in 1988, and most of the products that endeavored to replace it were not as good, or had folded during the prolonged baseball strike of 1994 and 1995. The first Baseball Prospectus, published in 1996, was produced one copy at a time on a laser printer, accidentally omitted the St. Louis Cardinals chapter, and sold just seventy-five copies. But the book quickly developed a cult following, with sales increasing exponentially each year.
Baseball Prospectus was a stat geek’s wet dream. There were the reams and reams of numbers, not just for major-league players but also for minor-league prospects whose performance had been “translated” to the major-league level. The writing was sharp if sometimes esoteric, full of Simpsons references, jokes about obscure ’80s porn films, and sarcastic asides about the group’s least favorite general managers.
But most important were its predictions about how each player would perform in the next season, in the form of a Huckabay-developed projection system called Vladimir. The system seemed to be the next step in the revolution that James had begun.
A good baseball projection system must accomplish three basic tasks:
Account for the context of a player’s statistics
Separate out skill from luck
Understand how a player’s performance evolves as he ages—what is known as the aging curve
The first task is relatively easy. Baseball, uniquely among the major American sports, has always been played on fields with nonstandard dimensions. It’s much easier to put up a high batting average in snug and boxy Fenway Park, whose contours are shaped by compact New England street grids, than in the cavernous environs of Dodger Stadium, which is surrounded by a moat of parking lot. By observing how players perform both at home and on the road, we can develop “park factors” to account for the degree of difficulty that a player faces. (For example, Fred Lynn, an MVP with the Red Sox during the 1970s, hit .347 over the course of his career at Fenway Park but just .264 at every other stadium.) Likewise, by observing what happens to players who switch from the National League to the American League, we can tell quite a bit about which league is better and account for the strength of a player’s competition.
The World’s Richest Data Set
The second chore—separating skill from luck—requires more work. Baseball is designed in such a way that luck tends to predominate in the near term: even the best teams lose about one-third of their ball games, and even the best hitters fail to get on base three out of every five times. Sometimes luck will obscure a player’s real skill level even over the course of a whole year. During a given season, a true .275 hitter has about a 10 percent chance of hitting .300 and a 10 percent chance of hitting .250 on the basis of luck alone.8
What a well-designed forecasting system can do is sort out which statistics are relatively more susceptible to luck; batting average, for instance, is more erratic than home runs. This is even more important for pitchers, whose statistics are notoriously inconsistent. If you want to predict a pitcher’s win-loss record, looking at the number of strikeouts he recorded and the number of walks he yielded is more informative than looking at his W’s and L’s from the previous season, because the former statistics are much more consistent from year to year.
The goal, as in formulating any prediction, is to weed out the root cause: striking batters out prevents them from getting on base, preventing them from getting on base prevents them from scoring runs, and preventing them from scoring runs prevents them from winning games. However, the further downstream you go, the more noise will be introduced into the system: a pitcher’s win-loss record is affected as much by how many runs his offense scores, something that he has essentially no control over, as by how well he pitches. The Seattle Mariners’ star pitcher Felix Hernandez went 19-5 in 2009 but 13-12 in 2010 despite pitching roughly as well in both years, because the Mariners had some epically terrible hitters in 2010.
Cases like these are not at all uncommon and tend to make themselves known if you spend any real effort to sort through the data. Baseball offers perhaps the world’s richest data set: pretty much everything that has happened on a major-league playing field in the past 140 years has been dutifully and accurately recorded, and hundreds of players play in the big leagues every year. Meanwhile,
although baseball is a team sport, it proceeds in a highly orderly way: pitchers take their turn in the rotation, hitters take their turn in the batting order, and they are largely responsible for their own statistics.* There are relatively few problems involving complexity and nonlinearity. The causality is easy to sort out.
That makes life easy for a baseball forecaster. A hypothesis can usually be tested empirically, and proven or disproven to a high degree of statistical satisfaction. In fields like economic or political forecasting where the data is much sparser—one presidential election every four years, not hundreds of new data points ever year—you won’t have that luxury and your prediction is more likely to go astray.
Behold: The Aging Curve
All this assumes, however, that a player’s skill level is constant from year to year—if only we could separate the signal from the noise, we’d know everything that we needed to. In fact, a baseball player’s skills are in a constant state of flux, and therein lies the challenge.
By looking at statistics for thousands of players, James had discovered that the typical player9 continues to improve until he is in his late twenties, at which point his skills usually begin to atrophy, especially once he reaches his midthirties.10 This gave James one of his most important inventions: the aging curve.
Olympic gymnasts peak in their teens; poets in their twenties; chess players in their thirties11; applied economists in their forties,12 and the average age of a Fortune 500 CEO is 55.13 A baseball player, James found, peaks at age twenty-seven. Of the fifty MVP winners between 1985 and 2009, 60 percent were between the ages of twenty-five and twenty-nine, and 20 percent were aged twenty-seven exactly. This is when the combination of physical attributes and mental attributes needed to play the game well seem to be in the best balance.
FIGURE 3-1: AGING CURVE FOR HITTER
This notion of the aging curve would have been extremely valuable to any team that had read James’s work. Under baseball’s contract rules, players do not become free agents until fairly late in their careers: after they’ve played at least six full major-league seasons (before then, they are under the exclusive control of the club that drafted them and cannot command a full market price). Since the typical rookie reaches the big leagues at twenty-three or twenty-four years old, he might not become a free agent until he is thirty—just after his window of peak performance has eclipsed. Teams were paying premium dollars for free agents on the assumption that they would replicate in their thirties the production they had exhibited in their twenties; in fact, it usually declined, and since Major League Baseball contracts are guaranteed, the teams had no recourse.
But James’s aging curve painted too smooth a picture. Sure, the average player might peak at age twenty-seven. As anyone who has paid his dues staring at the backs of baseball cards can tell you, however, players age at different paces. Bob Horner, a third baseman with the Atlanta Braves during the 1980s, won the Rookie of the Year award when he was just twenty and made the All-Star team when he was twenty-four; the common assumption at the time was that he was bound for the Hall of Fame. But by age thirty, set back by injuries and an ill-advised stint with the Yakult Swallows of the Japanese League, he was out of professional baseball entirely. On the other hand, the Seattle Mariner great Edgar Martinez did not have a steady job in the big leagues until he was twenty-seven. He was a late bloomer, however, having his best years in his late thirties and leading the league in RBIs when he was forty.
Although Horner and Martinez may be exceptional cases, it is quite rare for players to follow the smooth patterns of development that the aging curve implies; instead, a sort of punctuated equilibrium of jagged peaks and valleys is the norm.
Real aging curves are noisy—very noisy (figure 3-2). On average, they form a smooth-looking pattern. But the average, like the family with 1.7 children, is just a statistical abstraction. Perhaps, Gary Huckabay reasoned, there was some signal in the noise that James’s curve did not address. Perhaps players at physically demanding positions like shortstop tended to see their skills decline sooner than those who played right field. Perhaps players who are more athletic all around can be expected to have longer careers than those who have just one or two strong skills.
FIGURE 3-2: NOISY AGING PATTERNS FOR DIFFERENT HITTERS
Huckabay’s system hypothesized that there are twenty-six distinct aging curves, each applying to a different type of player.14 If Huckabay was correct, you could assess which curve was appropriate to which player and could therefore predict how that player’s career would track. If a player was on the Bob Horner track, he might have an early peak and an early decline. Or if he was more like Martinez, his best seasons might come later on.
While Huckabay’s Vladimir nailed some of its predictions, it ultimately was not much more accurate than the slow-and-steady type of projections developed by James15 that applied the same aging curve to every player. Some of the issue was that twenty-six was an arbitrary number for Huckabay’s categories, and it required as much art as science to figure out which group a player belonged in.
However, a person must have a diverse array of physical and mental skills to play baseball at an elite level: muscle memory, physical strength, hand-eye coordination, bat speed, pitch recognition, and the willpower to stay focused when his team endures a slump. Vladimir’s notion of different aging curves seemed like a more natural fit for the complexities inherent in human performance. In developing PECOTA, I tried to borrow some elements from Huckabay and some from Bill James.
In the 1986 Baseball Abstract, James introduced something called similarity scores, which as their name implies are designed to assess the statistical similarity between the career statistics of any two major-league players. The concept is relatively simple. They start out by assigning a score of 1,000 points between a set of two players, and then deduct points for each difference between them.16 Highly similar players might maintain scores as high as 950 or even 975, but the discrepancies quickly add up.
The similarity scores are extremely satisfying for anybody with a working knowledge of baseball history. Rather than look at a player’s statistics in a vacuum, they provide some sense of historical context. Pedroia’s statistics through age twenty-five, for instance, were similar to those of Rod Carew, the Panamanian great who led the Minnesota Twins in the 1970s, or to Charlie Gehringer, a Depression-era star for the Tigers.
James mainly intended for his similarity scores to be backward looking: to analyze, for instance, how suitable a player’s statistics were for the Hall of Fame. If you were trying to make the case that your favorite player belonged in Cooperstown, and you had observed that 9 of the 10 players with the most similar statistics had made it there, you’d have a very strong argument.
But couldn’t similarity scores be predictive, too? If we could identify, say, the one hundred players who were most comparable to Pedroia through a given age, might not the performance of those players over the balance of their careers tell us something about how Pedroia was likely to develop?
This was the idea that I set out to work on—and slowly, over the course of those long days at KPMG in 2002, PECOTA began to develop. It took the form of a giant, colorful Excel spreadsheet—fortuitously so, since Excel was one of the main tools that I used in my day job at KPMG. (Every time one of my bosses walked by, they assumed I was diligently working on a highly elaborate model for one of our clients.17)
Eventually, by stealing an hour or two at a time during slow periods during the workday, and a few more while at home at night, I developed a database consisting of more than 10,000 player-seasons (every major-league season since World War II was waged18) as well as an algorithm to compare any one player with another. The algorithm was somewhat more elaborate than James’s and sought to take full advantage of baseball’s exceptionally rich data set. It used a different method for comparing a set of players—what is technically known as a nearest neighbor analysis. It also considered a wider variety of factors—including thing
s like a player’s height and weight, that are traditionally more in the domain of scouting.
Like Huckabay’s system, PECOTA provided for the possibility that different types of players might age in different ways. But it didn’t try to force them onto one of twenty-six development curves; instead, it let this occur naturally by identifying a set of comparable players somewhere in baseball’s statistical galaxy. If it turned out, for instance, that a disproportionate number of Dustin Pedroia’s comparable players turned into strong major leaguers, that might imply something about Pedroia’s chances for success.
More often, however, a player’s most comparable players will be a mixed bag; the paths of players who might have similar statistics through a given point in their careers can diverge wildly thereafter. I mentioned that under James’s similarity scores, Pedroia was found to be similar to Charlie Gehringer and Rod Carew, two players who had long and illustrious careers and who eventually made the Hall of Fame. But Pedroia’s statistics over that period were also similar to Jose Vidro, an undistinguished second baseman for the Montreal Expos.
These differences can be especially dramatic for minor-league players. In 2009, when PECOTA identified the top comparables for Jason Heyward, then a nineteen-year-old prospect in the Atlanta Braves system, you could find everything from Hall of Famer to murder victim. Chipper Jones, Heyward’s number-two comparable, is an example of the former case: one of the greatest Atlanta Braves ever, he’s played seventeen seasons with the club and is a lifetime .304 hitter with more than 450 home runs. On the other hand, there was Dernell Stenson, a promising prospect whose numbers were also similar to Heyward’s. After playing in a developmental-league game in Arizona in 2003, he was tied up, shot, and run over with his own SUV in an apparently random act of violence.