by Nate Silver
Consider the case of the National Journal Political Insiders’ Poll, a survey of roughly 180 politicians, political consultants, pollsters, and pundits. The survey is divided between Democratic and Republican partisans, but both groups are asked the same questions. Regardless of their political persuasions, this group leans hedgehog: political operatives are proud of their battle scars, and see themselves as locked in a perpetual struggle against the other side of the cocktail party.
A few days ahead of the 2010 midterm elections, National Journal asked its panelists whether Democrats were likely to retain control of both the House and the Senate.21 There was near-universal agreement on these questions: Democrats would keep the Senate but Republicans would take control of the House (the panel was right on both accounts). Both the Democratic and Republican insiders were also almost agreed on the overall magnitude of Republican gains in the House; the Democratic experts called for them to pick up 47 seats, while Republicans predicted a 53-seat gain—a trivial difference considering that there are 435 House seats.
National Journal, however, also asked its panelists to predict the outcome of eleven individual elections, a mix of Senate, House, and gubernatorial races. Here, the differences were much greater. The panel split on the winners they expected in the Senate races in Nevada, Illinois, and Pennsylvania, the governor’s race in Florida, and a key House race in Iowa. Overall, Republican panelists expected Democrats to win just one of the eleven races, while Democratic panelists expected them to win 6 of the 11. (The actual outcome, predictably enough, was somewhere in the middle—Democrats won three of the eleven races that National Journal had asked about.22)
Obviously, partisanship plays some role here: Democrats and Republicans were each rooting for the home team. That does not suffice to explain, however, the unusual divide in the way that the panel answered the different types of questions. When asked in general terms about how well Republicans were likely to do, there was almost no difference between the panelists. They differed profoundly, however, when asked about specific cases—these brought the partisan differences to the surface.23
Too much information can be a bad thing in the hands of a hedgehog. The question of how many seats Republicans were likely to gain on Democrats overall is an abstract one: unless you’d studied all 435 races, there was little additional detail that could help you to resolve it. By contrast, when asked about any one particular race—say, the Senate race in Nevada—the panelists had all kinds of information at their disposal: not just the polls there, but also news accounts they’d read about the race, gossip they’d heard from their friends, or what they thought about the candidates when they saw them on television. They might even know the candidates or the people who work for them personally.
Hedgehogs who have lots of information construct stories—stories that are neater and tidier than the real world, with protagonists and villains, winners and losers, climaxes and dénouements—and, usually, a happy ending for the home team. The candidate who is down ten points in the polls is going to win, goddamnit, because I know the candidate and I know the voters in her state, and maybe I heard something from her press secretary about how the polls are tightening—and have you seen her latest commercial?
When we construct these stories, we can lose the ability to think about the evidence critically. Elections typically present compelling narratives. Whatever you thought about the politics of Barack Obama or Sarah Palin or John McCain or Hillary Clinton in 2008, they had persuasive life stories: reported books on the campaign, like Game Change, read like tightly bestselling novels. The candidates who ran in 2012 were a less appealing lot but still more than sufficed to provide for the usual ensemble of dramatic clichés from tragedy (Herman Cain?) to farce (Rick Perry).
You can get lost in the narrative. Politics may be especially susceptible to poor predictions precisely because of its human elements: a good election engages our dramatic sensibilities. This does not mean that you must feel totally dispassionate about a political event in order to make a good prediction about it. But it does mean that a fox’s aloof attitude can pay dividends.
A Fox-Like Approach to Forecasting
I had the idea for FiveThirtyEight* while waiting out a delayed flight at Louis Armstrong New Orleans International Airport in February 2008. For some reason—possibly the Cajun martinis had stirred something up—it suddenly seemed obvious that someone needed to build a Web site that predicted how well Hillary Clinton and Barack Obama, then still in heated contention for the Democratic nomination, would fare against John McCain.
My interest in electoral politics had begun slightly earlier, however—and had been mostly the result of frustration rather any affection for the political process. I had carefully monitored the Congress’s attempt to ban Internet poker in 2006, which was then one of my main sources of income. I found political coverage wanting even as compared with something like sports, where the “Moneyball revolution” had significantly improved analysis.
During the run-up to the primary I found myself watching more and more political TV, mostly MSNBC and CNN and Fox News. A lot of the coverage was vapid. Despite the election being many months away, commentary focused on the inevitability of Clinton’s nomination, ignoring the uncertainty intrinsic to such early polls. There seemed to be too much focus on Clinton’s gender and Obama’s race.24 There was an obsession with determining which candidate had “won the day” by making some clever quip at a press conference or getting some no-name senator to endorse them—things that 99 percent of voters did not care about.
Political news, and especially the important news that really affects the campaign, proceeds at an irregular pace. But news coverage is produced every day. Most of it is filler, packaged in the form of stories that are designed to obscure its unimportance.* Not only does political coverage often lose the signal—it frequently accentuates the noise. If there are a number of polls in a state that show the Republican ahead, it won’t make news when another one says the same thing. But if a new poll comes out showing the Democrat with the lead, it will grab headlines—even though the poll is probably an outlier and won’t predict the outcome accurately.
The bar set by the competition, in other words, was invitingly low. Someone could look like a genius simply by doing some fairly basic research into what really has predictive power in a political campaign. So I began blogging at the Web site Daily Kos, posting detailed and data-driven analyses on issues like polls and fundraising numbers. I studied which polling firms had been most accurate in the past, and how much winning one state—Iowa, for instance—tended to shift the numbers in another. The articles quickly gained a following, even though the commentary at sites like Daily Kos is usually more qualitative (and partisan) than quantitative. In March 2008, I spun my analysis out to my own Web site, FiveThirtyEight, which sought to make predictions about the general election.
The FiveThirtyEight forecasting model started out pretty simple—basically, it took an average of polls but weighted them according to their past accuracy—then gradually became more intricate. But it abided by three broad principles, all of which are very fox-like.
Principle 1: Think Probabilistically
Almost all the forecasts that I publish, in politics and other fields, are probabilistic. Instead of spitting out just one number and claiming to know exactly what will happen, I instead articulate a range of possible outcomes. On November 2, 2010, for instance, my forecast for how many seats Republicans might gain in the U.S. House looks like what you see in figure 2-3.
The most likely range of outcomes—enough to cover about half of all possible cases—was a Republican gain of between 45 and 65 seats (their actual gain was 63 seats). But there was also the possibility that Republicans might win 70 or 80 seats—if almost certainly not the 100 that Dick Morris had predicted. Conversely, there was also the chance that Democrats would hold just enough seats to keep the House.
The wide distribution of outcomes represented the most honest expre
ssion of the uncertainty in the real world. The forecast was built from forecasts of each of the 435 House seats individually—and an exceptionally large number of those races looked to be extremely close. As it happened, a remarkable 77 seats were decided by a single-digit margin.25 Had the Democrats beaten their forecasts by just a couple of points in most of the competitive districts, they could easily have retained the House. Had the Republicans done the opposite, they could have run their gains into truly astonishing numbers. A small change in the political tides could have produced a dramatically different result; it would have been foolish to pin things down to an exact number.
This probabilistic principle also holds when I am forecasting the outcome in an individual race. How likely is a candidate to win, for instance, if he’s ahead by five points in the polls? This is the sort of question that FiveThirtyEight’s models are trying to address.
The answer depends significantly on the type of race that he’s involved in. The further down the ballot you go, the more volatile the polls tend to be: polls of House races are less accurate than polls of Senate races, which are in turn less accurate than polls of presidential races. Polls of primaries, also, are considerably less accurate than general election polls. During the 2008 Democratic primaries, the average poll missed by about eight points, far more than implied by its margin of error. The problems in polls of the Republican primaries of 2012 may have been even worse.26 In many of the major states, in fact—including Iowa, South Carolina, Florida, Michigan, Washington, Colorado, Ohio, Alabama, and Mississippi—the candidate ahead in the polls a week before the election lost.
But polls do become more accurate the closer you get to Election Day. Figure 2-4 presents some results from a simplified version of the FiveThirtyEight Senate forecasting model, which uses data from 1998 through 2008 to infer the probability that a candidate will win on the basis of the size of his lead in the polling average. A Senate candidate with a five-point lead on the day before the election, for instance, has historically won his race about 95 percent of the time—almost a sure thing, even though news accounts are sure to describe the race as “too close to call.” By contrast, a five-point lead a year before the election translates to just a 59 percent chance of winning—barely better than a coin flip.
The FiveThirtyEight models provide much of their value in this way. It’s very easy to look at an election, see that one candidate is ahead in all or most of the polls, and determine that he’s the favorite to win. (With some exceptions, this assumption will be correct.) What becomes much trickier is determining exactly how much of a favorite he is. Our brains, wired to detect patterns, are always looking for a signal, when instead we should appreciate how noisy the data is.
I’ve grown accustomed to this type of thinking because my background consists of experience in two disciplines, sports and poker, in which you’ll see pretty much everything at least once. Play enough poker hands, and you’ll make your share of royal flushes. Play a few more, and you’ll find that your opponent has made a royal flush when you have a full house. Sports, especially baseball, also provide for plenty of opportunity for low-probability events to occur. The Boston Red Sox failed to make the playoffs in 2011 despite having a 99.7 percent chance of doing so at one point27—although I wouldn’t question anyone who says the normal laws of probability don’t apply when it comes to the Red Sox or the Chicago Cubs.
Politicians and political observers, however, find this lack of clarity upsetting. In 2010, a Democratic congressman called me a few weeks in advance of the election. He represented a safely Democratic district on the West Coast. But given how well Republicans were doing that year, he was nevertheless concerned about losing his seat. What he wanted to know was exactly how much uncertainty there was in our forecast. Our numbers gave him, to the nearest approximation, a 100 percent chance of winning. But did 100 percent really mean 99 percent, or 99.99 percent, or 99.9999 percent? If the latter—a 1 in 100,000 chance of losing—he was prepared to donate his campaign funds to other candidates in more vulnerable districts. But he wasn’t willing to take a 1 in 100 risk.
Political partisans, meanwhile, may misinterpret the role of uncertainty in a forecast; they will think of it as hedging your bets and building in an excuse for yourself in case you get the prediction wrong. That is not really the idea. If you forecast that a particular incumbent congressman will win his race 90 percent of the time, you’re also forecasting that he should lose it 10 percent of the time.28 The signature of a good forecast is that each of these probabilities turns out to be about right over the long run.
Tetlock’s hedgehogs were especially bad at understanding these probabilities. When you say that an event has a 90 percent chance of happening, that has a very specific and objective meaning. But our brains translate it into something more subjective. Evidence from the psychologists Daniel Kahneman and Amos Tversky suggests that these subjective estimates don’t always match up with the reality. We have trouble distinguishing a 90 percent chance that the plane will land safely from a 99 percent chance or a 99.9999 percent chance, even though these imply vastly different things about whether we ought to book our ticket.
With practice, our estimates can get better. What distinguished Tetlock’s hedgehogs is that they were too stubborn to learn from their mistakes. Acknowledging the real-world uncertainty in their forecasts would require them to acknowledge to the imperfections in their theories about how the world was supposed to behave—the last thing that an ideologue wants to do.
Principle 2: Today’s Forecast Is the First Forecast of the Rest of Your Life
Another misconception is that a good prediction shouldn’t change. Certainly, if there are wild gyrations in your forecast from day to day, that may be a bad sign—either of a badly designed model, or that the phenomenon you are attempting to predict isn’t very predictable at all. In 2012, when I published forecasts of the Republican primaries in advance of each state, solely according to the polls there, the probabilities often shifted substantially just as the polls did.
When the outcome is more predictable—as a general election is in the late stages of the race—the forecasts will normally be more stable. The comment that I heard most frequently from Democrats after the 2008 election was that they turned to FiveThirtyEight to help keep them calm.* By the end of a presidential race, as many as thirty or forty polls might be released every day from different states, and some of these results will inevitably fall outside the margin of error. Candidates, strategists, and television commentators—who have some vested interest in making the race seem closer than it really is—might focus on these outlier polls, but the FiveThirtyEight model found that they usually didn’t make much difference.
Ultimately, the right attitude is that you should make the best forecast possible today—regardless of what you said last week, last month, or last year. Making a new forecast does not mean that the old forecast just disappears. (Ideally, you should keep a record of it and let people evaluate how well you did over the whole course of predicting an event.) But if you have reason to think that yesterday’s forecast was wrong, there is no glory in sticking to it. “When the facts change, I change my mind,” the economist John Maynard Keynes famously said. “What do you do, sir?”
Some people don’t like this type of course-correcting analysis and mistake it for a sign of weakness. It seems like cheating to change your mind—the equivalent of sticking your finger out and seeing which way the wind is blowing.29 The critiques usually rely, implicitly or explicitly, on the notion that politics is analogous to something like physics or biology, abiding by fundamental laws that are intrinsically knowable and predicable. (One of my most frequent critics is a professor of neuroscience at Princeton.30) Under those circumstances, new information doesn’t matter very much; elections should follow a predictable orbit, like a comet hurtling toward Earth.
Instead of physics or biology, however, electoral forecasting resembles something like poker: we can observe our opponent’s be
havior and pick up a few clues, but we can’t see his cards. Making the most of that limited information requires a willingness to update one’s forecast as newer and better information becomes available. It is the alternative—failing to change our forecast because we risk embarrassment by doing so—that reveals a lack of courage.
Principle 3: Look for Consensus
Every hedgehog fantasizes that they will make a daring, audacious, outside-the-box prediction—one that differs radically from the consensus view on a subject. Their colleagues ostracize them; even their golden retrievers start to look at them a bit funny. But then the prediction turns out to be exactly, profoundly, indubitably right. Two days later, they are on the front page of the Wall Street Journal and sitting on Jay Leno’s couch, singled out as a bold and brave pioneer.
Every now and then, it might be correct to make a forecast like this. The expert consensus can be wrong—someone who had forecasted the collapse of the Soviet Union would have deserved most of the kudos that came to him. But the fantasy scenario is hugely unlikely. Even though foxes, myself included, aren’t really a conformist lot, we get worried anytime our forecasts differ radically from those being produced by our competitors.
Quite a lot of evidence suggests that aggregate or group forecasts are more accurate than individual ones, often somewhere between 15 and 20 percent more accurate depending on the discipline. That doesn’t necessarily mean the group forecasts are good. (We’ll explore this subject in more depth later in the book.) But it does mean that you can benefit from applying multiple perspectives toward a problem.