19 J. M. McPherson, The Battle Cry of Freedom: The Civil War Era (New York: Oxford University Press, 1988).
20 Joel Mokyr, The Economics of the Industrial Revolution (Totowa, NJ: Rowman & Allanheld, 1985).
21 McCloskey, “History, Differential Equations,” 36.
22 Ibid.
23 Arthur Conan Doyle, The Complete Sherlock Holmes (New York: Garden City, 1938).
24 Rosmarie Nagel, “Unraveling in Guessing Games: An Experimental Study,” American Economic Review 85 (1995): 1313–26.
25 Richard Thaler, “From Homo Economics to Homo Sapiens,” Journal of Economic Perspectives 14 (2000): 133–41.
26 This is no recent insight. Carl von Clausewitz noted how “war most closely resembles a card game.” It helps to be dealt good cards, but seasoned poker players know there is no quicker route to losing one’s shirt than by playing too predictably. (Carl von Clausewitz, On War, ed. and trans. Michael Howard and Peter Paret [Princeton, NJ: Princeton University Press, 1976]); see also Erik Gartzke, “War Is in the Error Term,” International Organization 53 [1999]: 567–87; R. Jervis, “The Future of World Politics,” International Security 16 [1991/1992]: 39–73.)
27 William A. Sherden, The Fortune Sellers (New York: Wiley, 1998).
28 R. Dawes, “The Prediction of the Future Versus an Understanding of the Past: A Basic Asymmetry,” American Journal of Psychology 106 (1993): 1–24.
29 B. Brehmer, “In One Word: Not from Experience,” Acta Psychologica 45 (1980): 223–41; P. D. Werner, T. L. Rose, and J. A. Yesavage, “Reliability, Accuracy, and Decision-Making Strategy in Clinical Predictions of Imminent Dangerousness,” Journal of Consulting and Clinical Psychology 51 (1983): 815–25; H. Einhorn and R. Hogarth, “Confidence in Judgment: Persistence of the Illusion of Validity,” Psychological Review 85 (1978): 395–416.
30 R. Dawes, “Behavioral Decision Making and Judgment,” in The Handbook of Social Psychology, 4th ed., vol. 1., ed. D. T. Gilbert, S. T. Fiske, and G. Lindzey, 497–548. (New York: McGraw-Hill, 1998); S. Fiske and S. Taylor, Social Cognition (New York: McGraw-Hill, 1991). Widespread reliance on simple heuristics is widely blamed for the poor showing—although there is vigorous debate over how often these simple heuristics lead us astray.
31 Robert Jervis, Perception and Misperception in International Politics (Princeton, NJ: Princeton University Press, 1976).
32 P. E. Tetlock, “Social Psychology and World Politics,” in Fiske, Gilbert, and Lindzey, Handbook of Social Psychology.
33 R. E. Neustadt and E. R. May, Thinking in Time: The Uses of History for Decision-Makers (New York: Free Press, 1986).
34 H. Einhorn and R. Hogarth, “Behavioral Decision Theory,” Annual Review of Psychology 31 (1981): 53–88.
35 P. E. Tetlock and P. Visser, “Thinking about Russia: Possible Pasts and Probable Futures,” British Journal of Social Psychology 39 (2000): 173–96; G. W. Breslauer and P. E. Tetlock, Learning in U.S. and Soviet Foreign Policy (Boulder, CO: Westview Press, 1991).
36 R. P. Abelson, E. Aronson, W. McGuire, T. Newcomb, M. Rosenberg, and P. Tannenbaum, eds. Theories of Cognitive Consistency: A Sourcebook (Chicago: Rand McNally, 1968).
37 E. J. Langer, “The Illusion of Control,” Journal of Personality and Social Psychology 32 (1975): 311–28.
38 On the dangers of being “too clever” in pursuit of patterns in random data, see Ward Edwards, “Probability Learning in 1000 Trials,” Journal of Experimental Psychology, 62 (1961): 385–94.
39 Dawes, “Behavioral Decision Making.”
40 J. Koehler, “The Base-Rate Fallacy Reconsidered: Descriptive, Normative, and Methodological Challenges,” Behavioral and Brain Sciences 19 (1996): 1–53.
41 A. Tversky and D. Kahneman, “Extensional Versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment,” Psychological Review 90 (1983): 293–315; on the difficulty of knowing what the relevant reference classes are, see R. Jervis, “Representativeness in Foreign Policy Judgments,” Political Psychology 7 (1986): 483–505.
42 P. G. Allen, “Econometric Forecasting,” in Principles of Forecasting, ed., J. S. Armstrong, 303–62 (Boston: Kluwer, 2001); M. Singer and A. B. Wildavsky, The Real World Order: Zones of Peace, Zones of Turmoil (Chatham, NJ: Chatham House, 1996).
43 This is not to imply that it is easy to figure out which base rates to use. Should we limit ourselves to the recent past or to the region narrowly defined or to the specific nation in question? Our mindless algorithms use a range of base rates that vary in time span (five, ten, and twenty-five years), regime-type specificity (stable democracies, unstable ones, dictatorships, and so on) and regional specificity (Nigeria, West Africa, Africa, the world).
44 T. Gilovich, R. Vallone, and A. Tversky, “The Hot Hand in Basketball: On the Misperception of Random Sequences,” Cognitive Psychology 17 (1985): 295–314.
45 A. H. Murphy and R. L. Winkler, “Probability Forecasts: A Survey of National Weather Service Forecasters,” Bulletin of the American Meteorological Society 55 (1974): 1449–53. Murphy, “Scalar and Vector Partitions, Parts I and II.”
46 Yates, Judgment and Decision Making.
47 Readers might find it surprising that simple case-specific extrapolation algorithms sometimes performed almost as well as formal time series models (of the generalized autoregressive distributed lag sort). We should expect this type of result whenever the true stochastic process governing the variable being forecasted (call it yt) is approximated by an autoregressive process of order one and no other variables are useful predictors of yt. In this situation, rational forecasters will adopt simple rules such as always predict the next period’s value will be yt – 1 + (1 – rho) * m, where yt – 1 is the last period’s value, rho is some constant less than or equal to 1 which indicates the variable’s “persistence,” and m is the unconditional mean to which the variable reverts over time (e.g., when rho = 1, the variable follows a random walk). Only the variable’s past value has predictive usefulness for the future.
48 W. M. Grove and P. Meehl, “Comparative Efficiency of Informal (Subjective, Impressionistic) and Formal (Mechanical, Algorithmic) Prediction Procedures: The Clinical-Statistical Controversy,” Psychology, Public Policy, and Law 2 (1996): 293–323.
49 Readers might notice here a shift in classifications of outcomes—from the value-neutral “change in the direction of more or less of something” to the value-laden “change for the better or worse.” The conversion was generally unproblematic because, in carrying out these value adjustments, we simply adopted the consensus expert perspective on what was good or bad and dropped those variables on which there was sharp disagreement. Experts overwhelmingly agreed, for instance, that the change for the better involved greater GDP, lower unemployment, lower central government debt as percentage of GDP, less armed conflict within or between states, lower corruption, greater transparency, greater political and economic freedom, and fewer nuclear-armed states. The results are not materially different, however, when we allow for individual differences in value perspectives (such as conservatives favoring more aggressive privatization transitions from socialism than did liberals or euro skeptics preferring abandoning the currency convergence project and euro enthusiasts hoping for the opposite, and the quite frequent emergence of disagreements about whether leadership change would be desirable in particular countries).
50 Readers who themselves meet the definition of political expert used here (see Methodological Appendix) might feel unfairly singled out. They should not. Although this project was not designed to compare how well experts from various fields perform prediction tasks, we can safely assert that political experts are far from alone in thinking they are better forecasters than they are. The co-existence of highly knowledgeable experts and anemic forecasting performance is a common phenomenon. See C. Camerer and E. Johnson, “The Process-Performance Paradox in Expert Judgment: How Can Experts Know So Much and Predict So Badly?” in Toward a General Theory of Expertise, ed. K. A. Ericsson an
d J. Smith, 195–217 (New York: Cambridge University Press, 1991); H. Arkes, “Over-confidence in Judgmental Forecasting,” in Principles of Forecasting, ed. S. Armstrong, 495–516 (Boston: Kluwer, 2001). It is also fair to say that political experts are not alone in their susceptibility to biases documented in later chapters, including belief perseverance, base-rate neglect, and hindsight bias. See N. V. Dawson, H. R. Arkes, C. Siciliano, R. Blinkhorn, M. Lakshmanan, and M. Petrelli, “Hindsight Bias: An Impediment to Accurate Probability Estimation in Clinicopathologic Conferences,” Medical Decision Making 8 (1988): 259–64; W. D. Gouvier, M. Uddo-Crane, and L. M. Brown, “Base Rates of Post-concussional Symptoms,” Archives of Clinical Neuropsychology 3 (1988): 273–78; G. B. Chapman and A. S. Elstein, “Cognitive Processes and Biases in Medical Decision Making,” in Decision Making in Health Care, ed. G. B. Chapman and F. A. Sonnenberg, 183–210 (Cambridge: Cambridge University Press, 2000).
The comparisons are not, however, always so soothing. There are domains in which experts are remarkably well calibrated and their performance towers over that of dilettantes. See A. H. Murphy and R. L. Winkler, “Probability Forecasting in Meterology,” Journal of the American Statistical Association 79 (1984): 489–500; G. Keren, “Facing Uncertainty in the Game of Bridge: A Calibration Study,” Organizational Behavior and Human Decision Processes 39 (1987): 98–114.
The differences across domains are probably due to a combination of factors. Quick, unequivocal feedback on the accuracy of predictions promotes superior performance. So too do norms within the profession that encourage confronting, rather than rationalizing away, mistakes (an especially severe shortcoming in the political domain that we document in chapter 4 and for which we propose possible solutions in chapter 8).
CHAPTER 3
Knowing the Limits of One’s Knowledge
FOXES HAVE BETTER CALIBRATION AND DISCRIMINATION SCORES THAN HEDGEHOGS
The fox knows many things but the hedgehog knows one big thing.
— ISAIAH BERLIN
The test of a first-rate intelligence is the ability to hold two opposing ideas in the mind at the same time, and still retain the ability to function.
—F. SCOTT FITZGERALD
BEHAVIORAL SCIENTISTS often disagree over not only the facts but also over what is worth studying in the first place. What one school of thought dismisses as a minor anomaly of no conceivable interest to any gainfully employed grown-up, another school elevates to center stage. So it is here. The competing perspectives on good judgment in chapter 1—skepticism and meliorism—offer starkly different assessments of the wisdom of searching for correlates of forecasting skill. Skeptics argue that chapter 2 settled the debate: good judgment and good luck are roughly one and the same. Meliorists sigh that chapter 3 finally gets us to the core issue: Why are some people quite consistently better forecasters than others?
Chapter 3 gives meliorism as fair a methodological shake as chapter 2 gave skepticism. The chapter is organized around a two-track approach to the search for good judgment: the rigor of a quantitative, variable-centered approach that follows a clear hypothesis-testing logic, and the richness of a looser qualitative approach that explores the patterns running through the arguments that our forecasters advanced for expecting some outcomes and rejecting others.
The first track is sensitive to skeptics’ concerns about capitalizing on chance. It caps the potentially limitless list of meliorist hunches by targeting only the most influential hypotheses that could be culled from either formal arguments in the research literature or informal comments of research participants. Its strength is the clarity with which it defines previously vague concepts and lays out explicit standards of proof. The second track throws skeptics’ caution to the wind. It pursues the search for the cognitive underpinnings of good judgment in an impressionistic, even opportunistic, fashion. It calls on us to be good listeners, to pick up those patterns of reasoning that distinguish the most from the least accurate forecasters. Its strength is its power to yield evocative accounts of why particular groups outperformed others when they did.
Of course, there is a rigor-richness trade-off. We purchase rigor by shaving off subtleties that are hard to count; we purchase richness by capitalizing on coincidental connections between what experts say and what subsequently happens. Fortunately, the knowledge game is not zero-sum here. In the end, the two search strategies yield reassuringly similar results. We discover that an intellectual trait widely considered a great asset in science—the commitment to parsimony—can be a substantial liability in real-world forecasting exercises.
THE QUANTITATIVE SEARCH FOR GOOD JUDGMENT
Skeptics see chapter 3 as a fool’s errand because they see enormous risks of capitalizing on chance when investigators get free license to fish in large empirical ponds until they finally catch something that correlates with forecasting accuracy. It was essential, therefore, to impose priorities in searching the vast universe of ways in which we human beings differ from one another. We could not test all possible meliorist hypotheses, but we could test three sizable subsets: those bearing on individual differences among experts in their backgrounds and accomplishments, in the content of their belief systems, and in their styles of reasoning.
Demographic and Life History Correlates
One can always second-guess the bona fides of the sample, but a lot of meliorist folk wisdom about good judgment bit the dust in this round of analyses. As table 3.1 shows, the list of close-to-zero, zero-order correlates is long. It made virtually no difference whether participants had doctorates, whether they were economists, political scientists, journalists, or historians, whether they had policy experience or access to classified information, or whether they had logged many or few years of experience in their chosen line of work. As noted in Chapter 2, the only consistent predictor was, ironically, fame, as indexed by a Google count: better-known forecasters—those more likely to be fêted by the media—were less well calibrated than their lower-profile colleagues.
TABLE 3.1
Individual Difference Predictors of Calibration of Subjective Probability Forecasts
* .05 significance
** .01 significance
Adjusted R2 = .29 (N = 177)
Content Correlates
Political belief systems vary on many dimensions, and the lengthier the laundry list of predictors, the greater the risk of capitalizing on chance. To preempt such objections, I used maximum likelihood factor analysis to reduce an unwieldy number of questionnaire items (thirteen) to a manageable number of thematic composites (a three-factor solution).1 Table 3.2 presents the loadings of each variable on each factor. The higher a variable’s loading on a factor, and the lower its loadings on other factors, the more important the variable is in uniquely defining the factor. The resulting factors were as follows.
TABLE 3.2
Variable Loadings in Rotated Factor Matrix from Maximum Likelihood Factor Analysis (Quartimin Rotation) of Belief Systems Items
Note: Asterisks highlight five highest loadings for each factor and bold labels atop each column highlight meaning of high loadings.
LEFT VERSUS RIGHT
The left wanted to redress inequalities within and across borders, expressed reservations about rapid transitions from socialist to market economies and about the impact of trade liberalization and unrestricted capital flows on countries with weak regulatory institutions, and worried about nasty side effects on the poor and the natural environment. The right were enthusiastic about market solutions but had grave reservations about “governmental meddling” that shifted attention from wealth creation to wealth distribution. To quote one: “Government failure is far more common than market failure.”
INSTITUTIONALISTS VERSUS REALISTS
Realists agreed that, new world order rhetoric to the side, world politics remains a “jungle.” They were wary of subordinating national policy to international institutions and of trusting words and promises when money and power are at stake. They also
worried about the power of “matrioshka nationalisms” in which secessionists break up existing states but then splinter into factions seeking to secede from the secession.2 Institutionalists saw more potential for moving beyond the “cutthroat” logic of realism, nationalism, and deterrence. They stressed the necessity of coordinating national policy with international bodies, the capacity of new ideas to transform old definitions of national interests, and the dangers of failing to take into account the concerns of other parties.
DOOMSTERS VERSUS BOOMSTERS
Boomsters emphasized the resilience of ecosystems (their capacity to “snap back”) and the ingenuity of human beings in coping with scarcity (when the going gets tough, the tough get going and come with cost-effective substitutes for nonrenewable resources). They put priority on developing economic growth and high-technology solutions, and the most radical believed that humanity was on the verge, with advances in artificial intelligence and molecular biology, of entering a posthuman phase of history populated by beings smarter, healthier, and happier than the dim, diseased, and depressed hominids roaming the planet in the late twentieth century. Doomsters stressed the fragility of ecosystems and the urgency of promoting sustainable development and living within the “carrying capacity constraints” of the planet. Radical doomsters believed that humanity is on the wrong track, one leading to “grotesque” income gaps between haves and have-nots, to “criminal plundering” of nature, and to growing violence in the underdeveloped world as scarcity aggravates ethnic and religious grievances.
Taken together, the three factors reveal a lot about the worldviews of participants.3 But these content factors offer little help in our search for broad bandwidth predictors of forecasting skill. Figure 3.1—which divides respondents into low, moderate, or high scorers on each factor—shows that neither low nor high scorers enjoy any notable advantage on either calibration or discrimination. This null result holds up across the zones of turbulence and stability, and across forecasting topics. Figure 3.1 does, however, offer a glimmering of hope that the meliorist quest for correlates of good judgment is not quixotic. Moderates consistently bested extremists on calibration—an advantage that they did not purchase by sacrificing discrimination.4
Expert Political Judgment Page 15