Expert Political Judgment

Home > Other > Expert Political Judgment > Page 31
Expert Political Judgment Page 31

by Philip E. Tetlock

The first prong of this three-pronged defense questions the professional qualifications of forecasters. Perhaps I recruited “unusually dumb hedgehogs” and “unusually smart foxes.” I cannot respond by offering IQ data (or by lifting the veil of anonymity promised participants). I cannot even claim representative samples of “experts” (my samples are samples of convenience). But I can point to the summary profile data in the Methodological Appendix that show most participants had postgraduate degrees and, on average, twelve years of professional experience. And I can point to the analyses in chapter 3 that reveal negligible relations between (a) professional status, seniority, and field of specialization and the key correspondence and coherence measures of good judgment; (b) cognitive style and professional specialization or status. These arguments do not eliminate the possibility that another, more “elite” cohort of hedgehogs would have bested foxes, but they do shift the burden of proof to challengers.

  The second prong of the three-pronged defense questions how motivated participants were to display good judgment. Here it is necessary to cede a bit more ground. Many of the judgments we elicited were undoubtedly top-of-the-head opinions that experts knew they would never have to justify. This raises the possibility that experts might have done better if the stakes had been higher and they had been more motivated to get it right. There is some truth in this objection. Accountability and incentives do sometimes “de-bias” judgment. But they also often have either no effect or the effect of amplifying, not attenuating, bias and error.19 And it is a stretch to argue that either incentives or accountability pressures would have helped only hedgehogs. Indeed, if accountability motivates experts to do their cognitive best, and if hedgehogs and foxes have different views on what their cognitive best is, one can make a good case that accountability pressures would widen the performance gap by making hedgehogs even more hedgehogish and foxes even more foxish.20

  The last prong of the three-pronged defense declares that experts look far worse than they normally would because we picked an unusually turbulent period of twentieth-century history in which to study good judgment. This objection has little merit. First, it misses a key argument of this book. Of course, experts can predict more accurately in periods of tranquility. But the data show that, even in quiescent times, hedgehogs suffer a performance deficit (relative to both foxes and simple extrapolation algorithms). Second, this objection underestimates how turbulent other fifteen-year slices of the twentieth century have been. World War I astonished those who thought that the major powers were too financially interdependent to go to war. The Great Depression startled economists who thought that boom-bust cycles were passé. And the threat posed by the Nazis was recognized by only an opinionated minority until late in the game. The transformations wrought by the advent of nuclear weaponry and later ballistic missiles were mostly unanticipated (and continue to provoke controversy as experts offer radically discrepant estimates of both the feasibility and long-term effects of ballistic missile defense). The dramatic turn of events in China in the 1970s—Maoist extremism evolving into pragmatic reformism—may have an illusory retrospective inevitability, but few foresaw them. The case that experts got stuck with an unusually unpredictable decade is weak.

  Misunderstanding What Game Is Being Played

  Hedgehogs can play a final card. They can argue that epistemic criteria do not apply to them: their real goal is political impact. As one hedgehog resident of a “think tank” patiently explained, “You play a publish-or-perish game run by the rules of social science…. You are under the misapprehension that I play the same game. I don’t. I fight to preserve my reputation in a cutthroat adversarial culture. I woo dumb-ass reporters who want glib sound bites.” In his world, only the overconfident survive, and only the truly arrogant thrive.

  Another hedgehog stressed the need—if one wants to be remembered—to make bold claims that “run against the grain of the conventional wisdom.” One needed considerable gumption in the mid-1930s to tell the influential appeasers in the governments of the major democracies that Nazi Germany was a gangster state with which it was impossible to do business, or to announce in the late 1970s that China, staggered by the convulsions of the Cultural Revolution, was about to take off economically, or that OPEC, riding high from having repeatedly multiplied the price of oil, would soon get its comeuppance, or to declare in the early 1980s that the Soviet Union was marching straight into the ash heap of history.

  It is tough to gauge whether this objection is a flimsy excuse or compelling alternative explanation. But the evidence tips toward “excuse.” In debriefing interviews, we asked nearly half of the participants whether they saw themselves more as “neutral observers whose primary goal is figuring out what is going on” or “promoting a point of view.” We found only the slightest of tendencies for self-professed neutral observers to be either foxes (r = .10) or better forecasters (r = .08). It is thus hard to argue that hedgehogs lost to foxes as consistently as they did because they were playing a policy-advocacy game.

  CLOSING OBSERVATIONS

  Even the formidable combination of defenses mobilized in this chapter fails to acquit hedgehogs of all allegations of error and bias leveled against them. But the defense objections took some sting out of certain allegations. Hedgehogs narrow the performance gap when we introduce big value adjustments (giving them the benefit of the doubt that their mistakes were the right mistakes), big probability-weighting adjustments (giving them credit for making courageous predictions) and big fuzzy-set adjustments (giving them some credit for being “almost right”). Defenders of hedgehogs are also right that endorsing a belief system defense does not automatically make one defensive, that some double standards are justifiable, and that openness to close-call counterfactuals is not presumptive evidence of open-mindedness (it may be a sign that one has not thought things through). Most important, defenders of hedgehogs do us a service by calling our attention to the elaborate matrix of assumptions on which attributions of “better” or “worse” judgment must rest. Claims of the form “members of group X have better judgment than members of group Y” are not purely scientific; they are complex amalgams of empirical generalizations, value priorities, and even metaphysical sentiments.

  In sum, chapter 6 reminds us of the provisional character of judgments of good judgment. It transforms what has heretofore been a cognitive morality play, populated with well-defined good and bad guys, into a murkier tale, populated by characters attired in varying shades of grey. Chapter 7 takes us still further in this anti-Manichaean direction.

  1 Suedfeld and Tetlock, “Individual Differences,” G. Gigerenzer and P. M. Todd, Simple Heuristics That Make Us Smart (New York: Oxford University Press, 2000).

  2 On the risks of looking too hard for signals in “noisy” data, see R. E. Nisbett, H. Zukier, and R. Lemley, “The Dilution Effect: Nondiagnostic Information,” Cognitive Psychology 13 (1981): 248–77; P. E. Tetlock and R. Boettger, “Accountability: A Social Magnifier of the Dilution Effect,” Journal of Personality and Social Psychology 57 (1989): 388–98.

  3 P. E. Tetlock and A. Tyler, “Winston Churchill’s Cognitive and Rhetorical Style: The Debates over Nazi Intentions and Self-government for India,” Political Psychology 17 (1996): 149–70.

  4 P. E. Tetlock and R. Boettger, “Accountability Amplifies the Status Quo Effect When Change Creates Victims,” Journal of Behavioral Decision Making 7 (1994): 1–23.

  5 Suedfeld and Tetlock, “Individual Differences”; G. Gigerenzer and P. M. Todd, Simple Heuristics That Make Us Smart (New York: Oxford University Press, 2000).

  6 B. M. Staw and J. Ross, “Commitment in an Experimenting Society: A Study of the Attribution of Leadership from Administrative Scenarios,” Journal of Applied Psychology 65 (1980): 249–60.

  7 Figure 6.1 reveals that errors of underprediction dwarf those of overprediction. This result, is however, “forced” by the logical structure of the forecasting task. In three-possible-future tasks, the probabilities assigned to eac
h possible future—status quo, and change in the direction of either less or more of something—were usually constrained to add to 1.0 and thus average to a grand mean of .33. The actual averages for these futures hovered between .25 and .40, and these values are obviously further from 1.0 (the value taken by reality when the target event occurs and the only possible error is underprediction) than from zero (the value taken by reality when the target event does not occur and the only possible error is overprediction).

  8 In prospect theory, the shape of the probability-weighting function reflects the psychophysics of diminishing sensitivity: marginal impact diminishes with distance from reference points. For probability assessments, there are two reference points: impossibility (zero) and certainty (1.0). Diminishing sensitivity implies an S-shaped weighting function that is concave near zero and convex near 1.0. The weight of a probability estimate decreases with its distance from the natural boundaries of impossibility and certainty. This weighting function helps to explain the well-established fourfold pattern of risk attitudes: overweighting low probabilities (risk seeking for gains and risk averse for losses), and underweighting high probabilities (risk averse for gains and risk seeking for losses). See A. Tversky and D. Kahneman. “Advances in Prospect Theory: Cumulative Representation of Uncertainty. Journal of Risk and Uncertainty, 5, 297–323(1992).

  9 Indeed, there is so much ambiguity that Robert Jervis has argued that the cognitive bias of base-rate neglect is not a bias in world politics. See R. Jervis, “Representativeness in Foreign Policy Judgments,” Political Psychology 7 (1986): 483–505.

  10 One could also argue that, although the fox and hedgehog environments may have been roughly equally difficult to predict—as gauged by overall variability and percentage of that variability that can be captured in formal statistical models—the foxes may have won merely because they were better at picking out variables with large autocorrelations (and thus could be predicted by extrapolating from the past) or because they were more attuned to intercorrelations among outcome variables (and thus aware of the implications of change in one variable for other variables). The Technical Appendix shows this is not true. The fox advantage holds up, reasonably evenly, across variables with the smallest to the largest squared multiple correlations. But, even if this objection were true, it hardly counts as a compelling defense that hedgehogs were so committed to their theories that they missed such obvious predictive cues.

  11 Fuzzy sets are not the product of fuzzy math. See L. Zadeh, “A Fuzzy-Set-Theoretic Interpretation of Linguistic Hedges,” Journal of Cybernetics 2 (1972): 4–34. On the implications for social science of fuzzy-set concepts, see C. Ragin, Fuzzy-Set Social Science (Chicago: University of Chicago Press, 2000).

  12 For a popularization of this argument, see J. Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter than the Few and Collective Wisdom Shapes Business, Economies, Societies, and Nations (New York: Doubleday, 2004).

  13 On falsification and the methodology of scientific research programs, see Suppe, “Scientific Theories,” and Lakatos, “Research Programs.” Objections of this sort played a role in persuading philosophers to abandon simple (Popperian) falsificationism in favor of more complex variants of the doctrine (Suppes). It is not necessary here to stake out a detailed position on falsificationism. It is sufficient to specify a procedural test of bias that, if failed, would convince even forgiving falsificationists that something is awry. This de minimis test asks: Do judges who “got it wrong” display vastly greater interest than judges who “got it right” in challenging the probity of the exercise? If so, we still cannot determine who is biased (incorrect forecasters may be too quick to complain or correct forecasters too quick to accept the test), but we can say that bias exists.

  14 Suppe, “Scientific Theories.”

  15 Ibid.

  16 For illustrations of how easy it is to put conflicting spins on low and high ends of cognitive style dimensions, see P. E. Tetlock, R. Peterson, and J. Berry, “Flattering and Unflattering Personality Portraits of Integratively Simple and Complex Managers,” Journal of Personality and Social Psychology 64 (1993): 500–511; P. E. Tetlock, D. Armor, and R. Peterson, “The Slavery Debate in Antebellum America: Cognitive Style, Value Conflict, and the Limits of Compromise,” Journal of Personality and Social Psychology 66 (1994): 115–26. These conflicting spins also tie into old debates between cognitive consistency theorists over how tightly integrated belief systems tend to be. Minimalists such as Robert Abelson stressed the wispy connections among idea-elements, whereas maximalists such as William McGuire posited more constraints. See R. Abelson, “Psychological Implication,” in Theories of Cognitive Consistency: A Source Book ed. R. Abelson, E. Aronson, W. McGuire, T. Newcomb, M. Rosenberg, and P. Tannenbaum (Chicago: Rand McNally, 1968), 112–39; W. J. McGuire, “Theory of the Structure of Human Thought,” in Abelson et al., Theories of Cognitive Consistency.

  17 U. Hoffrege, R. Hertwig, and G. Gigerenzer, Journal of Experimental Psychology: Learning, Memory, and Cognition 26 (2000): 303–20. Curiously, psychologists who mostly disagree over the adaptive value of certainty of hindsight agree on the processes underlying the effect. They agree that when people cannot remember their original judgment—which often happens—they reconstruct the judgment based on what they know about the situation. They agree that people automatically use outcome feedback to update their knowledge about the situation. And they agree that people reconstruct their original judgments using this updated knowledge.

  18 It is useful to distinguish moderate from extreme proponents of this defense. My differences with the moderates are a matter of degree: I preferred shorter time frames and more precisely defined outcome variables. But my differences with the extremists—those more at home with prophecy than prediction—are unbridgeable. One hedgehog asked whether my “tidy scheme” left room for visionaries such as Nietzsche or Marx. Would I count Nietzsche’s “God is dead” pronouncement as wrong because religion still thrives in the late twentieth century? Could I concede that Nietzsche had anticipated that totalitarian movements would fill the spiritual void left by the death of God? My view is that the God-is-dead prediction might be resuscitated via the off-on-timing defense, but it currently looks like a loser and that Nietzsche should not get credit for anticipating Stalinism or Nazism given there is scant evidence that these phenomena arose because people had stopped believing in God and scanter evidence that Nietzsche came remotely close to predicting when, where, and how these phenomena would arise. We might as well credit Nostradamus with predicting World War II.

  19 C. F. Camerer and R. M. Hogarth, “The Effects of Financial Incentives in Experiments: A Review and Capital-Labor-Production Framework,” Journal of Risk and Uncertainty 19 (1999): 1–3, 7–42.

  20 This influential hypothesis holds that accountability and other possible motivators of cognitive work have the net effect of increasing the likelihood of dominant or overlearned responses. P. E. Tetlock and J. Lerner, “The Social Contingency Model: Identifying Empirical and Normative Boundary Conditions on the Error-and-Bias Portrait of Human Nature,” in Dual Process Models in Social Psychology, ed. S. Chaiken and Y. Trope (New York: Guilford Press, 1999).

  CHAPTER 7

  Are We Open-minded Enough to Acknowledge the Limits of Open-mindedness?

  The impossible sometimes happens and the inevitable sometimes does not.

  — KAHNEMAN

  Like the measured length of a coastline, which increases as the map becomes more specific, the perceived likelihood of an event increases as its description becomes more specific.

  —TVERSKY AND KOEHLER 1994

  These observations suggest an image of the mind as a bureaucracy in which different parts have access to different data, assign them different weights, and hold different views of the situation.

  — KAHNEMAN AND TVERSKY 1982

  CHAPTER 6 revealed considerable canniness in what looked like incorrigible closed-mindedness. It failed, however, to exonerate exp
erts of all the cognitive indictments against them. In this chapter, therefore, let us assume a lingering problem: all too often experts, especially the hedgehogs among them, claim to know more about the future than they actually know (chapter 3), balk at changing their minds in the face of unexpected evidence (chapter 4), and dogmatically defend their deterministic explanations of the past (chapter 5).

  The diagnosis implies a cure: observers would stack up better against our benchmarks of good judgment if only they were a tad more open-minded. We should not be glib, though, in our prescriptions. Careless cures can cause great harm. Promoters of “debiasing” schemes should shoulder a heavy burden of proof. Would-be buyers should insist that schemes that purportedly improve “how they think” be grounded in solid assumptions about (a) the workings of the human mind and—in particular—how people go about translating vague hunches about causality into the precise probabilistic claims measured here; (b) the workings of the external environment and—in particular—the likely impact of proposed correctives on the mistakes that people most commonly make in coping with frequently recurring challenges.

  Chapter 7 reports the first systematic studies of the impact of a widely deployed debiasing tool, scenario exercises, on the judgmental performance of political experts in real-world settings.1 Such exercises rest on an intuitively appealing premise: the value of breaking the tight grip of our preconceptions on our views of what could have been or might yet be. I am also convinced from personal experience that such exercises, skillfully done, have great practical value in contingency planning in business, government, and the military. But the data reported in this chapter make it difficult to argue that such exercises—standing alone—improve either the empirical accuracy or logical coherence of expert’s predictions. For scenario exercises have no net effect on the empirical accuracy and logical coherence of the forecasts of roughly one-half of our sample (the hedgehogs) and an adverse net effect on the accuracy and coherence of the forecasts of the other half (the foxes). The more theory-driven hedgehogs find it easier to reject proliferating scenario branching points summarily, with a brusque “It just ain’t gonna happen.” The more open-minded foxes find it harder to resist invitations to consider strange or dissonant possibilities—and are thus in greater danger of being lured into wild goose chases in which they fritter away scarce resources contemplating possibilities they originally rightly dismissed. For the first time in this book, foxes become more susceptible than hedgehogs to a serious bias: the tendency to assign so much likelihood to so many possibilities that they become entangled in self-contradictions.

 

‹ Prev