Book Read Free

Everything Is Obvious

Page 17

by Duncan J. Watts


  When we first told some prediction market researchers about this result, their reaction was that it must reflect some special feature of football. The NFL, they argued, has lots of rules like salary caps and draft picks that help to keep teams as equal as possible. And football, of course, is a game where the result can be decided by tiny random acts, like the wide receiver dragging in the quarterback’s desperate pass with his fingertips as he runs full tilt across the goal line to win the game in its closing seconds. Football games, in other words, have a lot of randomness built into them—arguably, in fact, that’s what makes them exciting. Perhaps it’s not so surprising after all, then, that all the information and analysis that is generated by the small army of football pundits who bombard fans with predictions every week is not superhelpful (although it might be surprising to the pundits). In order to be persuaded, our colleagues insisted, we would have to find the same result in some other domain for which the signal-to-noise ratio might be considerably higher than it is in the specific case of football.

  OK, what about baseball? Baseball fans pride themselves on their near-fanatical attention to every measurable detail of the game, from batting averages to pitching rotations. Indeed, an entire field of research called sabermetrics has developed specifically for the purpose of analyzing baseball statistics, even spawning its own journal, the Baseball Research Journal. One might think, therefore, that prediction markets, with their far greater capacity to factor in different sorts of information, would outperform simplistic statistical models by a much wider margin for baseball than they do for football. But that turns out not to be true either. We compared the predictions of the Las Vegas sports betting markets over nearly twenty thousand Major League baseball games played from 1999 to 2006 with a simple statistical model based again on home-team advantage and the recent win-loss records of the two teams. This time, the difference between the two was even smaller—in fact, the performance of the market and the model were indistinguishable. In spite of all the statistics and analysis, in other words, and in spite of the absence of meaningful salary caps in baseball and the resulting concentration of superstar players on teams like the New York Yankees and Boston Red Sox, the outcomes of baseball games are even closer to random events than football games.

  Since then, we have either found or learned about the same kind of result for other kinds of events that prediction markets have been used to predict, from the opening weekend box office revenues for feature films to the outcomes of presidential elections. Unlike sports, these events occur without any of the rules or conditions that are designed to make sports competitive. There is also a lot of relevant information that prediction markets could conceivably exploit to boost their performance well beyond that of a simple model or a poll of relatively uninformed individuals. Yet when we compared the Hollywood Stock Exchange (HSX)—one of the most popular prediction markets, which has a reputation for accurate prediction—with a simple statistical model, the HSX did only slightly better.7 And in a separate study of the outcomes of five US presidential elections from 1988 to 2004, political scientists Robert Erikson and Christopher Wlezien found that a simple statistical correction of ordinary opinion polls outperformed even the vaunted Iowa Electronic Markets.8

  TRUST NO ONE, ESPECIALLY YOURSELF

  So what’s going on here? We are not really sure, but our suspicion is that the strikingly similar performance of different methods is an unexpected side effect of the prediction puzzle from the previous chapter. On the one hand, when it comes to complex systems—whether they involve sporting matches, elections, or movie audiences—there are strict limits to how accurately we can predict what will happen. But on the other hand, it seems that one can get pretty close to the limit of what is possible with relatively simple methods. By analogy, if you’re handed a weighted die, you might be able to figure out which sides will come up more frequently in a few dozen rolls, after which you would do well to bet on those outcomes. But beyond that, more elaborate methods like studying the die under a microscope to map out all the tiny fissures and irregularities on its surface, or building a complex computer simulation, aren’t going to help you much in improving your prediction.

  In the same way, we found that with football games a single piece of information—that the home team wins slightly more than half the time—is enough to boost one’s performance in predicting the outcome above random guessing. In addition, a second simple insight, that the team with the better win-loss record should have a slight advantage, gives you another significant boost. Beyond that, however, all the additional information you might consider gathering—the recent performance of the quarterback, the injuries on the team, the girlfriend troubles of the star running back—will only improve your predictions incrementally at best. Predictions about complex systems, in other words, are highly subject to the law of diminishing returns: The first pieces of information help a lot, but very quickly you exhaust whatever potential for improvement exists.

  Of course, there are circumstances in which we may care about very small improvements in prediction accuracy. In online advertising or high-frequency stock trading, for example, one might be making millions or even billions of predictions every day, and large sums of money may be at stake. Under these circumstances, it’s probably worth the effort and expense to invest in sophisticated methods that can exploit the subtlest patterns. But in just about any other business, from making movies or publishing books to developing new technologies, where you get to make only dozens or at most hundreds of predictions a year, and where the predictions you are making are usually just one aspect of your overall decision-making process, you can probably predict about as well as possible with the help of a relatively simple method.

  The one method you don’t want to use when making predictions is to rely on a single person’s opinion—especially not your own. The reason is that although humans are generally good at perceiving which factors are potentially relevant to a particular problem, they are generally bad at estimating how important one factor is relative to another. In predicting the opening weekend box office revenue for a movie, for example, you might think that variables such as the movie’s production and marketing budgets, the number of screens on which it will open, and advance ratings by reviewers are all highly relevant—and you’d be correct. But how much should you weight a slightly worse-than-average review against an extra $10 million marketing budget? It isn’t clear. Nor is it clear, when deciding how to allocate a marketing budget, how much people will be influenced by the ads they see online or in a magazine versus what they hear about the product from their friends—even though all these factors are likely to be relevant.

  You might think that making these sorts of judgments accurately is what experts would be good at, but as Tetlock showed in his experiment, experts are just as bad at making quantitative predictions as nonexperts and maybe even worse.9 The real problem with relying on experts, however, is not that they are appreciably worse than nonexperts, but rather that because they are experts we tend to consult only one at a time. Instead, what we should do is poll many individual opinions—whether experts or not—and take the average. Precisely how you do this, it turns out, may not matter so much. With all their fancy bells and whistles, prediction markets may produce slightly better predictions than a simple method like a poll, but the difference between the two is much less important than the gain from simply averaging lots of opinions somehow. Alternatively, one can estimate the relative importance of the various predictors directly from historical data, which is really all a statistical model accomplishes. And once again, although a fancy model may work slightly better than a simple model, the difference is small relative to using no model at all.10 At the end of the day, both models and crowds accomplish the same objective. First, they rely on some version of human judgment to identify which factors are relevant to the prediction in question. And second, they estimate and weight the relative importance of each of these factors. As the psychologist Robyn Dawes
once pointed out, “the whole trick is to know what variables to look at and then know how to add.”11

  By applying this trick consistently, one can also learn over time which predictions can be made with relatively low error, and which cannot be. All else being equal, for example, the further in advance you predict the outcome of an event, the larger your error will be. It is simply harder to predict the box office potential of a movie at green light stage than a week or two before its release, no matter what methods you use. In the same way, predictions about new product sales, say, are likely to be less accurate than predictions about the sales of existing products no matter when you make them. There’s nothing you can do about that, but what you can do is start using any one of several different methods—or even use all of them together, as we did in our study of prediction markets—and keep track of their performance over time. As I mentioned at the beginning of the previous chapter, keeping track of our predictions is not something that comes naturally to us: We make lots of predictions, but rarely check back to see how often we got them right. But keeping track of performance is possibly the most important activity of all—because only then can you learn how accurately it is possible to predict, and therefore how much weight you should put on the predictions you make.12

  FUTURE SHOCK

  No matter how carefully you adhere to this advice, a serious limitation with all prediction methods is that they are only reliable to the extent that the same kind of events will happen in the future as happened in the past, and with the same average frequency.13 In regular times, for example, credit card companies may be able to do a pretty good job of predicting default rates. Individual people may be complicated and unpredictable, but they tend to be complicated and unpredictable in much the same way this week as they were last week, and so on average the models work reasonably well. But as many critics of predictive modeling have pointed out, many of the outcomes that we care about most—like the onset of the financial crisis, the emergence of a revolutionary new technology, the overthrow of an oppressive regime, or a precipitous drop in violent crime—are interesting to us precisely because they are not regular times. And in these situations some very serious problems arise from relying on historical data to predict future outcomes—as a number of credit card companies discovered when default rates soared in the aftermath of the recent financial crisis.

  Even more important, the models that many banks were using to price mortgage-backed derivatives prior to 2008—like the infamous CDOs—now seem to have relied too much on data from the recent past, during which time housing prices had only gone up. As a result, ratings analysts and traders alike collectively placed too low a probability on a nationwide drop in real-estate values, and so badly underestimated the risk of mortgage defaults and foreclosure rates.14 At first, it might seem that this would have been a perfect application for prediction markets, which might have done a better job of anticipating the crisis than all the “quants” working in the banks. But in fact it would have been precisely these people—along with the politicians, government regulators, and other financial market specialists who also failed to anticipate the crisis—who would have been participating in the prediction market, so it’s unlikely that the wisdom of crowds would have been any help at all. Arguably, in fact, it was precisely the “wisdom” of the crowd that got us into the mess in the first place. So if models, markets, and crowds can’t help predict black swan events like the financial crisis, then what are we supposed to do about them?

  A second problem with methods that rely on historical data is that big, strategic decisions are not made frequently enough to benefit from a statistical approach. It may be the case, historically speaking, that most wars end poorly, or that most corporate mergers don’t pay off. But it may also be true that some military interventions are justified and that some mergers succeed, and it may be impossible to tell the difference in advance. If you could make millions, or even hundreds, of such bets, it would make sense to go with the historical probabilities. But when facing a decision about whether or not to lead the country into war, or to make some strategic acquisition, you cannot count on getting more than one attempt. Even if you could measure the probabilities, therefore, the difference between a 60 percent and 40 percent probability of success may not be terribly meaningful.

  Like anticipating black swans, making one-off strategic decisions is therefore ill suited to statistical models or crowd wisdom. Nevertheless, these sorts of decisions have to get made all the time, and they are potentially the most consequential decisions that anyone makes. Is there a way to improve our success here as well? Unfortunately, there’s no clear answer to this question. A number of approaches have been tried over the years, but none of them has a consistently successful track record. In part that’s because the techniques can be difficult to implement correctly, but mostly it’s because of the problem raised in the previous chapter—that there is simply a level of uncertainty about the future that we’re stuck with, and this uncertainty inevitably introduces errors into the best-laid plans.

  THE STRATEGY PARADOX

  Ironically, in fact, the organizations that embody what would seem to be the best practices in strategy planning—organizations, for example, that possess great clarity of vision and that act decisively—can also be the most vulnerable to planning errors. The problem is what strategy consultant and author Michael Raynor calls the strategy paradox. In his book of the same name, Raynor illustrates the paradox by revisiting the case of Sony’s Betamax videocassette, which famously lost out to the cheaper, lower-quality VHS technology developed by Matsushita. According to conventional wisdom, Sony’s blunder was twofold: First, they focused on image quality over running time, thereby conceding VHS the advantage of being able to tape full-length movies. And second, they designed Betamax to be a standalone format, whereas VHS was “open,” meaning that multiple manufacturers could compete to make the devices, thereby driving down the price. As the video-rental market exploded, VHS gained a small but inevitable lead in market share, and this small lead then grew rapidly through a process of cumulative advantage. The more people bought VHS recorders, the more stores stocked VHS tapes, and vice versa. The result over time was near-total saturation of the market by the VHS format and a humiliating defeat for Sony.15

  What the conventional wisdom overlooks, however, is that Sony’s vision of the VCR wasn’t as a device for watching rented movies at all. Rather, Sony expected people to use VCRs to tape TV shows, allowing them to watch their favorite shows at their leisure. Considering the exploding popularity of digital VCRs that are now used for precisely this purpose, Sony’s view of the future wasn’t implausible at all. And if it had come to pass, the superior picture quality of Betamax might well have made up for the extra cost, while the shorter taping time may have been irrelevant.16 Nor was it the case that Matsushita had any better inkling than Sony how fast the video-rental market would take off—indeed, an earlier experiment in movie rentals by the Palo Alto–based firm CTI had failed dramatically. Regardless, by the time it had become clear that home movie viewing, not taping TV shows, would be the killer app of the VCR, it was too late. Sony did their best to correct course, and in fact very quickly produced a longer-playing BII version, eliminating the initial advantage held by Matsushita. But it was all to no avail. Once VHS got a sufficient market lead, the resulting network effects were impossible to overcome. Sony’s failure, in other words, was not really the strategic blunder it is often made out to be, resulting instead from a shift in consumer demand that happened far more rapidly than anyone in the industry had anticipated.

  Shortly after their debacle with Betamax, Sony made another big strategic bet on recording technology—this time with their MiniDisc players. Determined not to make the same mistake twice, Sony paid careful attention to where Betamax had gone wrong, and did their best to learn the appropriate lessons. In contrast with Betamax, Sony made sure that MiniDiscs had ample capacity to record whole albums. And mindful of the
importance of content distribution to the outcome of the VCR wars, they acquired their own content repository in the form of Sony Music. At the time they were introduced in the early 1990s, MiniDiscs held clear technical advantages over the then-dominant CD format. In particular, the MiniDiscs could record as well as play, and because they were smaller and more resistant to jolts they were better suited to portable devices. Recordable CDs, by contrast, required entirely new machines, which at the time were extremely expensive.

  By all reasonable measures the MiniDisc should have been an outrageous success. And yet it bombed. What happened? In a nutshell, the Internet happened. The cost of memory plummeted, allowing people to store entire libraries of music on their personal computers. High-speed Internet connections allowed for peer-to-peer file sharing. Flash drive memory allowed for easy downloading to portable devices. And new websites for finding and downloading music abounded. The explosive growth of the Internet was not driven by the music business in particular, nor was Sony the only company that failed to anticipate the profound effect that the Internet would have on production, distribution, and consumption of music. Nobody did. Sony, in other words, really was doing the best that anyone could have done to learn from the past and to anticipate the future—but they got rolled anyway, by forces beyond anyone’s ability to predict or control.

  Surprisingly, the company that “got it right” in the music industry was Apple, with their combination of the iPod player and their iTunes store. In retrospect, Apple’s strategy looks visionary, and analysts and consumers alike fall over themselves to pay homage to Apple’s dedication to design and quality. Yet the iPod was exactly the kind of strategic play that the lessons of Betamax, not to mention Apple’s own experience in the PC market, should have taught them would fail. The iPod was large and expensive. It was based on closed architecture that Apple refused to license, ran on proprietary software, and was actively resisted by the major content providers. Nevertheless, it was a smashing success. So in what sense was Apple’s strategy better than Sony’s? Yes, Apple had made a great product, but so had Sony. Yes, they looked ahead and did their best to see which way the technological winds were blowing, but so did Sony. And yes, once they made their choices, they stuck to them and executed brilliantly; but that’s exactly what Sony did as well. The only important difference, in Raynor’s view, was that Sony’s choices happened to be wrong while Apple’s happened to be right.17

 

‹ Prev