Everything Is Obvious
Page 32
6. To be precise, we had different amounts of data for each of the methods—for example, our own polls were conducted over only the 2008–2009 season, whereas we had nearly thirty years of Vegas data, and TradeSports predictions ended in November 2008, when it was shut down—so we couldn’t compare all six methods over any given time interval. Nevertheless, for any given interval, we were always able to compare multiple methods. See Goel, Reeves, et al. (2010) for details.
7. In this case, the model was based on the number of screens the movie was projected to open on, and the number of people searching for it on Yahoo! the week before it opened. See Goel, Reeves, et al. (2010) for details. See Sunstein (2005) for more details on the Hollywood Stock Exchange and other prediction markets.
8. See Erikson and Wlezien (2008) for details of their comparison between opinion polls and the Iowa Electronic Markets.
9. Ironically, the problem with experts is not that they know too little, but rather that they know too much. As a result, they are better than nonexperts at wrapping their guesses in elaborate rationalizations that make them seem more authoritative, but are in fact no more accurate. See Payne, Bettman, and Johnson (1992) for more details of how experts reason. Not knowing anything, however, is also bad, because without a little expertise, one has trouble even knowing what one ought to be making guesses about. For example, while most of the attention paid to Tetlock’s study of expert prediction was directed at the surprisingly poor performance of the experts—who, remember, were more accurate when making predictions outside their area of expertise than in it—Tetlock also found that predictions made by naïve subjects (in this case university undergraduates) were significantly worse than those of the experts. The correct message of Tetlock’s study, therefore, was not that experts are no better than anyone at making predictions, but rather that someone with only general knowledge of the subject, but not no knowledge at all, can outperform someone with a great deal of knowledge. See Tetlock (2005) for details.
10. Spyros Makridakis and colleagues have shown in a series of studies over the years (Makridakis and Hibon 2000; Makridakis et al. 1979; Makridakis et al. 2009b) that simple models are about as accurate as complex models in forecasting economic time series. Armstrong (1985) also makes this point.
11. See Dawes (1979) for a discussion of simple linear models and their usefulness to decision making.
12. See Mauboussin (2009, Chapters 1 and 3) for an insightful discussion on how to improve predictions, along with traps to be avoided.
13. The simplest case occurs when the distribution of probabilities is what statisticians call stationary, meaning that its properties are constant over time. A more general version of the condition allows the distribution to change as long as changes in the distribution follow a predictable trend, such as average house prices increasing steadily over time. However, in either case, the past is assumed to be a reliable predictor of the future.
14. Possibly if the models had included data from a much longer stretch of time—the past century rather than the past decade or so—they might have captured more accurately the probability of a large, rapid, nationwide downtown. But so many other aspects of the economy also changed over that period of time that it’s not clear how relevant much of this data would have been. Presumably, in fact, that’s why the banks decided to restrict the time window of their historical data the way they did.
15. See Raynor (2007, Chapter 2) for the full story.
16. Sony did in fact pursue a partnership with Matsushita, but abandoned the plan in light of Matsushita’s quality problems. Sony therefore opted for product quality while Matsushita opted for low cost—both reasonable strategies that had a chance of succeeding.
17. As Raynor writes, “Sony’s strategies for Betamax and MiniDisc had all the elements of success, but neither succeeded. The cause of these failures was, simply put, bad luck: the strategic choices Sony made were perfectly reasonable; they just turned out to be wrong.” (p. 44).
18. For an overview of the history of scenario planning, see Millet (2003). For theoretical discussions, see Brauers and Weber (1988), Schoemaker (1991), Perrottet (1996), and Wright and Goodwin (2009). Scenario planning also closely resembles what Makridakis, Hogarth and Gaba (2009a) call “future perfect thinking.”
19. For details of Pierre Wack’s work at Royal Dutch/Shell, see Wack (1985a; 1985b).
20. Raynor actually distinguishes three kinds of management: functional management, which is about optimizing daily tasks; operational management, which is focused on executing existing strategies; and strategic management, which is focused on the management of strategic uncertainty. (Raynor 2007, pp. 107–108)
21. For example, a 2010 story about Ford’s then CEO claimed that “What Ford won’t do is change direction again, at least not under Mr. Mulally’s watch. He promises that he—and Ford’s 200,000 employees—will not waver from his ‘point of view’ about the future of the auto industry. ‘That is what strategy is all about,’ he says. ‘It’s about a point of view about the future and then making decisions based on that. The worst thing you can do is not have a point of view, and not make decisions.’ New York Times, January 9, 2010.
22. This example was originally presented in Beck (1983), but my discussion of it is based on the analysis by Schoemaker (1991).
23. According to Schoemaker (1991, p. 552), “A deeper scenario analysis would have recognized the confluence of special circumstances (e.g. high oil prices, tax incentives for drilling, conducive interest rates, etc.) underlying this temporary peak. Good scenario planning goes beyond just high-low projections.”
24. See Raynor (2007, p. 37).
CHAPTER 8: THE MEASURE OF ALL THINGS
1. Some more details about Zara’s supply chain management are provided in a Harvard Business Review case study of the company (2004, pp. 69–70). Additional details are provided in Kumar and Linguri (2006).
2. Mintzberg, it should be noted, was careful to differentiate strategic planning from “operational” planning, which is concerned with short-term optimization of existing procedures. The kind of planning models that don’t work for strategic plans actually do work quite well for operational planning—indeed, it was for operational planning that the models were originally developed, and it was their success in this context that Mintzberg believed had encouraged planners to repurpose them for strategic planning. The problem is therefore not that planning of any kind is impossible, any more than prediction of any kind is impossible, but rather that certain kinds of plans can be made reliably and others can’t be, and that planners need to be able to tell the difference.
3. See Helft (2008) for a story about the Yahoo! home page overhoul.
4. See Kohavi et al. (2010) and Tang et al. (2010).
5. See Clifford (2009) for a story about startup companies using quantitative performance metrics to substitute for design instinct.
6. See Alterman (2008) for Peretti’s original description of the Mullet Strategy. See Dholakia and Vianello (2009) for a discussion of how the same approach can work for communities built around brands, and the associated tradeoff between control and insight.
7. See Howe (2008, 2006) for a general discussion of crowdsourcing. See Rice (2010) for examples of recent trends in online journalism.
8. See Clifford (2010) for more details on Bravo, and Wortman (2010) for more details on Cheezburger Network. See http://bit.ly/9EAbjR for an interview with Jonah Peretti about contagious media and BuzzFeed, which he founded.
9. See http://blog.doloreslabs.com for many innovative uses of crowd sourcing.
10. See Paolacci et al (2010) for details of turker demographics and motivations. See Kittur et al. (2008) and Snow et al. (2008) for studies of Mechanical Turk reliability. And see Sheng, Provost, and Ipeirotis (2008) for a method for improving turker reliability.
11. See Polgreen et al. (2008) and Ginsberg et al. (2008) for details of the influenza studies. Recently, the CDC has reduced its reporting delay for
influenza caseloads (Mearian 2009), somewhat undermining the time advantages of search-based surveillance.
12. The Facebook happiness index is available at http://apps.facebook.com/usa-gnh. See also Kramer (2010) for more details. A similar approach has been used to extract happiness indices from song lyrics and blog postings (Dodds and Danforth 2009) as well as Twitter updates (Bollen et al. 2009).
13. See http://yearinreview.yahoo.com/2009 for a compilation of most popular searches in 2009. Facebook has a similar service based on status updates, as does Twitter. As some commenters have noted (http://www.collisiondetection.net/mt/archives/2010/01/the_problem_wit.php), these lists often produce rather banal results, and so possibly would be more interesting or useful if constrained to more specific subpopulations of interest to particular individuals—like his or her friends, for example. Fortunately, modifications like this are relatively easy to implement; thus the fact that topics of highest average interest are unsurprising or banal does not imply that the capability to reflect collective interest is itself uninteresting.
14. See Choi and Varian (2008) for more examples of “predicting the present” using search trends.
15. See Goel et al. (2010, Lahaie, Hofman) for details of using web search to make predictions.
16. Steve Hasker and I wrote about this approach to planning in marketing a few years ago in the Harvard Business Review (Watts and Hasker 2006).
17. The relationship between sales and advertising is in fact a textbook example of what economists call the endogeneity problem (Berndt 1991).
18. In fact, there was a time when controlled experiments of this kind enjoyed a brief burst of enthusiasm among advertisers, and some marketers, especially in the direct-mail world, still run them. In particular, Leonard Lodish and colleagues conducted a series of advertising experiments, mostly in the early 1990s using split cable TV (Abraham and Lodish 1990; Lodish et al. 1995a; Lodish et al. 1995b; and Hu et al. 2007). Also see Bertrand et al. (2010) for an example of a direct-mail advertising experiment. Curiously, however, the practice of routinely including control groups in advertising campaigns, for TV, word-of-mouth, and even brand advertising, never caught on, and these days it is mostly overlooked in favor of statistical models, often called “marketing mix models” (http://en.wikipedia.org/wiki/Marketing_mix_modeling).
19. See, for example, a recent Harvard Business School article by the president and CEO of comScore (Abraham 2008). Curiously, the author was one of Lodish’s colleagues who worked on the split-cable TV experiments.
20. User anonymity was maintained throughout the experiment by using a third-party service to match Yahoo! and retailer IDs without disclosing individual identities to the researchers. See Lewis and Reiley (2009) for details.
21. More effective advertising may even be better for the rest of us. If you only saw ads when there was a chance you might be persuaded by them, you’d probably see many fewer ads, and possibly wouldn’t find them as annoying.
22. See Brynjolfsson and Schrage (2009). Department stores have long experimented with product placement, trying out different locations or prices for the same product in different stores to learn which arrangements sell the most. But now that virtually all physical products are labeled with unique barcodes, and many also contain embedded RFID chips, they have the potential to track inventory and measure variation between stores, regions, times of the day, or times of the year—possibly leading to what Marshall Fisher of the University of Pennsylvania Wharton School has called the era of “Rocket Science” retailing (Fisher 2009). Ariely (2008) has also made a similar point.
23. See http://www.povertyactionlab.org/ for information on the MIT Poverty Action Lab. See Arceneaux and Nickerson (2009) and Gerber et al (2009) for examples of field experiments run by political scientists. See Lazear (2000) and Bandiera, Barankay, and Rasul (2009) for examples of field experiments run by labor economists. See O’Toole (2007, p. 342) for the example of the national parks and Ostrom (1999, p. 497) for a similar attitude to common pool resource governance, in which she argues that “all policy proposals must be considered as experiments.” Finally, see Ayers (2007, chapter 3) for other examples of field experiments.
24. Ethical considerations also limit the scope of experimental methods. For example, although the Department of Education could randomly assign students to different schools, and while that would probably be the best way to learn which education strategies really work, doing so would impose hardship on the students who were assigned to the bad schools, and so would be unethical. If you have a reasonable suspicion that something might be harmful, you cannot ethically force people to experience it even if you’re not sure; nor can you ethically refuse them something that might be good for them. All of this is as it should be, but it necessarily limits the range of interventions to which aid and development agencies can assign people or regions randomly, even if they could do so practically.
25. For specific quotes, see Scott (1998) pp. 318, 313, and 316, respectively.
26. See Leonhardt (2010) for a discussion of the virtues of cap and trade. See Hayek (1945) for the original argument.
27. See Brill (2010) for an interesting journalistic account of the Race to the Top. See Booher-Jennings (2005) and Ravitch (2010) for critiques of standardized testing as the relevant metric for student performance and teacher quality.
28. See Heath and Heath (2010) for their definition of bright spots. See Marsh et al. (2004) for more details of the positive deviance approach. Examples of positive deviance can be found at http://www.positivedeviance.org/. The hand-washing story is taken from Gawande (2008, pp. 13–28), who describes an initial experiment run in Pittsburgh. Gawande cautions that it is still uncertain how well the initial results will last, or whether they will generalize to other hospitals; however, a recent controlled experiment (Marra et al. 2010) suggests that they might.
29. See Sabel (2007) for a description of bootstrapping. See Watts (2003, Chapter 9) for an account of Toyota’s near catastrophe with “just in time” manufacturing, and also their remarkable recovery. See Nishiguchi and Beaudet (2000) for the original account. See Helper, MacDuffie, and Sabel (2000) for a discussion of how the principles of the Toyota production system have been adopted by American firms.
30. See Sabel (2007) for more details on what makes for successful industrial clusters, and Giuliani, Rabellotti, and van Dijk (2005) for a range of case studies. See Lerner (2009) for cautionary lessons in government attempts to stimulate innovation.
31. Of course in attempting to generalize local solutions, one must remain sensitive to the context in which they are used. Just because a particular hand-washing practice works in one hospital does not necessarily mean that it will work in another, where a different set of resources, constraints, problems, patients, and cultural attitudes may prevail. We don’t always know when a solution can be applied more broadly—in fact, it is precisely this unpredictability that makes central bureaucrats and administrators unable to solve the problem in the first place. Nevertheless, that should be the focus of the plan.
32. Easterly (2006, p. 6).
CHAPTER 9: FAIRNESS AND JUSTICE
1. Herrera then sued the city, which in 2006 eventually settled for $1.5 million. Three other officers who were involved in the incident were fired, and overall seventeen members of the 72nd precinct, including the commander, were disciplined. Police Commissioner Kerik opened an investigation into the operation of the midnight shift, which was apparently known to suffer from poor supervision and lax routines. Both Mayor Giuliani, and his successor, Michael Bloomberg, weighed in on the case, as did Governor Pataki. The legal status of the unborn baby Ricardo resulted in a fight between the medical examiner, who claimed the baby did not live independently of its mother and was therefore not to be considered a separate death, and the district prosecutor, who claimed the opposite. From the initial reports of the accident through the settlement of the lawsuit, the New York Times published nearly forty articles about th
e tragedy.
2. For a discussion of the relationship between rational organizing principles and the actual functioning of real social organizations, see Meyer and Rowan (1977), DiMaggio and Powell (1983), and Dobbin (1994). For a comprehensive treatment of the “new institutionalist” view of organizational sociology, see Powell and DiMaggio (1991).
3. See Menand (2001, pp. 429–33) for a discussion of Wendell Holmes’s reasoning.
4. The psychologist Ed Thorndike was the first to document the Halo Effect in psychological evaluations (cite Thorndike 1920). For a review of the psychological literature on the Halo Effect, see Cooper (1981). For the John Adams quote, see Higginbotham (2001, p. 216).
5. For more examples of the Halo Effect in business, see Rosenzweig (2007). For a glowing story about the success of Steve & Barry’s, see Wilson (2008). For a story about their subsequent bankruptcy, see Sorkin (2008).
6. See Rosenzweig (2007, pp. 54–56) for more examples of attribution error, and Staw (1975) for details of the experiment that Rosenzweig discusses.
7. To illustrate, consider a simple thought experiment in which we compare a “good” process, G, with a “bad” process, B, and where, just for the sake of the example, G has a 60 percent chance of success, while B succeeds only 40 percent of the time. If you think this isn’t a big difference, imagine two roulette wheels that produced red outcomes 60 percent and 40 percent of the time—betting on red and black, respectively, one could quickly and easily make a fortune. Likewise, a strategy for making money in financial markets by placing many small bets would do very well if it paid out equal amounts of money 60 percent of the time, and lost them 40 percent of the time. But imagine now that instead of spinning a roulette wheel—a process we can repeat many times—our processes correspond to alternative corporate strategies or education policies. This now being an experiment that can be run only once, we observe the following probabilities