Expert Political Judgment

Home > Other > Expert Political Judgment > Page 33
Expert Political Judgment Page 33

by Philip E. Tetlock


  Figure 7.3. The set of possible Japanese futures unpacked into increasingly differentiated subsets.

  Summing Up the Scenario Experiments

  Scenario exercises are promoted in the political and business worlds as correctives to dogmatism and overconfidence. And by this point in the book, the need for such correctives should not be in question. But the scenario experiments show that scenario exercises are not cure-alls. Indeed, the experiments give us grounds for fearing that such exercises will often fail to open the minds of the inclined-to-be-closed-minded hedgehogs but succeed in confusing the already-inclined-to-be-open-minded foxes—confusing foxes so much that their open-mindedness starts to look like credulousness.15

  Figure 7.4. Effects of scenario-generation exercises on hedgehog and fox, expert and dilettante, forecasters of five- to ten-year futures of Japan (1992–1997–2002).

  Attentive readers will notice here a mirror image of the expertise-by-cognitive-style interaction that drove down the forecasting accuracy of hedgehogs in chapter 3. Back then, hedgehog accuracy suffered most when hedgehogs made predictions in domains where they were most knowledgeable—and thus had the cognitive resources to construct convincing stories for their favorite future. In chapter 7, fox coherence suffered most when foxes worked through scenario exercises in domains where they were most knowledgeable—and thus had the cognitive resources to construct convincing stories for widely divergent possible futures. In each case, cognitive style moderated how forecasters used their expertise: for self-justifying ends among hedghogs and for self-subversive ends among foxes. Or, if we flip the interaction, we could say that expertise moderated the magnitude of the cognitive-style effect: without a rich knowledge base, the cognitive style effect was anemic; with one, it was powerful.

  Two scenario experiments are obviously not a massive database, but the results are sufficiently similar that it is worth posing a thought experiment. Suppose that participants in all forecasting domains had worked through scenario exercises and the effects were identical to the Canadian and Japanese experiments. We could then extrapolate what the effects of scenario exercises would have been across the board by replacing the probabilities that various subgroups of forecasters assigned with the hypothetical probabilities they would have assigned if they had done scenarios exercises. We know, for example, that the biggest increases were concentrated among foxlike experts contemplating change from the status quo. We also know that scenario effects were larger for low and middle probability categories (0.1 to 0.6) and smaller at the end points of zero (impossible) and 1.0 (sure thing), where forecasters were presumably more confidently settled in their judgments. What happens when we perform these value substitutions: pumping up low and moderate probabilities and merging probability categories whenever a lower-likelihood class of events (say, .2) rises so fast that it overtakes an adjacent category (say, .3)?

  Figure 7.5 shows the impact on two key indicators—calibration and discrimination—of the forecasting accuracy of foxes and hedgehogs making predictions in their roles as experts or dilettantes. Even when we impose the reflective equilibrium constraint that scenario-inflated probabilities must add to 1.0, the projected effects on performance are uniformly negative. And when we relax the constraint that probabilities must add to 1.0, the projected effects are worse, with foxes taking a bigger hit than hedgehogs. The causal mechanisms are not mysterious. Scenarios impair accuracy because they embolden forecasters to attach excessive probabilities to too many possibilities, and this is especially true of foxes judging dramatic departures from the status quo. Indeed, returning to the catch-up theme of chapter 6, we find that hedgehogs who refuse scenario exercises are at performance parity with foxes who embrace the exercises.

  It would be foolish to conclude from this extrapolation exercise that the scenario method can never help anyone. It is possible that we did not calibrate our scenario manipulations correctly and thus failed to produce the optimal blend of analytical and imaginative thinking that would open hedgehogs to new possibilities but not overwhelm foxes with too many possibilities. It is also possible that there are conditions under which the scenario method—even as operationalized here—could enhance accuracy. But those conditions are strikingly restrictive. Scenarios should most reliably help when (a) the reigning consensus favors continuation of the status quo; (b) big surprises lie in waiting and will soon produce sharp departures from the status quo; (c) the scenario script writers have good hunches as to the direction that change will take and skew the scenario elaboration exercises accordingly. Unfortunately, the data in earlier chapters make it hard to argue that experts (scenario writers included) can do much better than chance in anticipating big surprises for either the better or the worse. And the data on scenario effects in this chapter make it hard to argue that hiring scenario consultants is a prudent expenditure of resources unless the writers can do better than chance in targeting which futures to embellish and which to “pass on.”

  Figure 7.5. The performance of hedgehogs and foxes (H and F), making predictions inside or outside of their domains of expertise (E and D), deteriorates when we replace their original forecasts (the starting points at the origin of each set of arrows) with best estimates of the forecasts they would have made if they had disciplined their scenario-based thinking with reflective equilibrium exercises that required probabilities to sum to 1.0 (downward arrow from first to second data point) or if they had not so disciplined their scenario-based thinking (downward arrow from second to third data point). Lower scores on both the x- and y-axes signify worse performance. Both hedgehog and fox performance suffers, but fox performance suffers more.

  DEBIASING HOW WE THINK ABOUT POSSIBLE PASTS

  One might despair over the utility of the scenario method for improving probability judgments about possible futures but still hope the method will check biases in how people judge possible pasts. The first two studies in this section examine the power of thinking about counterfactual scenarios to check the well-known hindsight bias in which, once people learn of a historical outcome, they have difficulty recalling how they thought things would work out prior to learning of the outcome. To study this bias, we need to compare experts’ ex ante states of mind to their ex post recollections of those states of mind. The second set of studies assesses the impact of counterfactual scenario thinking on perceptions of historical events that occurred long before contemporary observers were born. We obviously can no longer measure the accuracy of ex post recollections of ex ante states of mind. But we can assess the degree to which counterfactual scenarios sensitize observers to contingencies that they previously downplayed and to hitherto latent logical inconsistencies in their probabilistic reasoning.

  Hindsight Bias

  The hindsight bias is a promising candidate for correction. As readers of chapter 4 may recall, in two of the certainty-of-hindsight studies conducted in 1997–1998, we asked specialists who had made predictions for North Korea and China in 1992–1993 to reconstruct the subjective probabilities that they had assigned to possible futures five years earlier. These studies revealed a hindsight bias: experts exaggerated the likelihood that they had assigned to the status quo options (in both cases, the correct “correspondence” answer). Experts originally set average probabilities of .38 for continuation of the political status quo in China and of .48 for continuation of the status quo in North Korea; five years later, experts recalled assigning subjective probabilities of .53 and .69 respectively.

  Normally, at this juncture, the researcher would reveal, as diplomatically as possible, our records of experts’ original expectations and proceed to measure the belief system defenses that were a major focus of chapter 4. In these two cases, however, we altered the interview schedule by asking experts to make another judgment: “Looking back on what has happened in China/North Korea over the last five years, we would value your expert opinion on how close we came to experiencing alternative outcomes—alternative outcomes that are either significantly better pol
itically or economically than the current situation or significantly worse.” To create a social atmosphere in which participants felt they would not lose face if they changed their minds but also did not feel pressured to change their minds, we informed participants that “sometimes thinking about these what-if scenarios changes our views not only of the past but also our recollections of what we ourselves once thought possible or impossible. And sometimes these what-if exercises have no effect whatsoever. In light of the exercise you just did, do you want to modify any of your earlier estimates of the subjective probabilities that you assigned five years ago?”16

  Figure 7.6. The impact of imagining scenarios in which events could have unfolded differently on hindsight bias in 1997–1998 recollections of predictions made in 1992–1993 for China and North Korea. Reduced hindsight effects can be inferred from the smaller gap between post-scenario recollections and actual predictions than between immediate recollections and actual predictions.

  Figure 7.6 shows that encouraging forecasters to generate scenarios of alternative outcomes reduced the hindsight bias in both the North Korean and Chinese exercises. The reduction was roughly of equal magnitude for hedgehogs and foxes—which meant it was substantial enough to cut hindsight down to nearly zero for foxes and to halve the effect for hedgehogs. “Imaginability” largely drove those effects. The power of scenarios to attenuate bias disappears when we control for the fact that, the more elaborate experts’ scenarios of alternative pasts, the less prone experts were to the hindsight effect.

  These results are strikingly similar to those obtained in laboratory research on debiasing hindsight via “imagine alternative outcome” manipulations. Stimulating counterfactual musings helps to check smug “I knew it all along” attitudes toward the past. These results also dovetail with theoretical accounts that attribute the hindsight effect to the “automaticity” of theory-driven thought: the rapidity with which people assimilate known outcomes into their favorite cause-effect schemas, in the process demoting once possible, even probable, futures to the status of implausible historical counterfactuals. One mechanism via which scenario manipulations may be checking hindsight is by resurrecting these long-lost possibilities and infusing them with “narrative life.” Ironically, though, this resurrection is possible only if people fall prey to an opposing-process cognitive bias: the human tendency to attach higher probabilities to more richly embellished and thus more imaginable scenarios—the exact opposite of what we should do if we appreciated the basic principle that scenarios can only fall in likelihood when we add contingent details to the narrative.

  The successful use of scenario exercises to check hindsight bias provides reassurance that the failure of scenario exercises to improve foresight was not just a by-product of the feebleness of our experimental manipulations. But this should be faint consolation for the consultants. There is not nearly as much money in correcting judgments of possible pasts as of possible futures. It is hard to envision hordes of executives clamoring for assistance in recalling more accurately how wrong they once were. Future work should address the possibility, however, that shattering the illusion of cognitive continuity (the “I knew it all along” attitude) is a necessary first step in transforming observers into better judges of the limits of their own knowledge (better confidence calibration) as well as more timely belief updaters (better Bayesians). Cultivating humility in our assessments of our own past predictive achievements may be essential to cultivating realism in our assessments of what we can do now and in the future.

  Sensitizing Observers to Historical Contingency

  Hindsight bias is a failing of autobiographical memory. When we examine historical thinking about events further back in time, however, we lose the valuable “what did you really think earlier” benchmark of accuracy. We cannot travel back in time to reconstruct how likely observers thought (prior to learning of outcomes) that the Cuban missile crisis of October 1962 would be resolved peacefully or that the July 1914 crisis preceding World War I would culminate in such carnage. Nonetheless, chapter 5 offers grounds for suspecting the operation of a more generic form of hindsight bias, a failure of historical imagination that limits our appreciation of possibilities that once existed but have long since been foreclosed. Observers in general, and hedgehogs in particular, often seem overeager to achieve explanatory closure and, in this quest, adopt a heavy-handedly deterministic stance toward history that portrays what is as something that had to be, as the inevitable consequence of the operation of favorite covering laws on well-defined antecedent conditions. One gauge of “how close to inevitable” those perceptions can become is the degree to which observers summarily reject close-call counterfactuals that imply history could easily have been rerouted. Close-call scenarios have the potential to mess up our understanding of the past, to riddle grand generalizations, such as “neorealist balancing” and “the robustness of nuclear deterrence,” with probabilistic loopholes.

  Ned Lebow, Geoffrey Parker, and I conducted two experiments that assessed the power of unpacking scenarios to open observers’ minds to the possibility that history contains more possibilities than they had previously supposed.17

  CUBAN MISSILE CRISIS EXPERIMENT

  One experiment examined retrospective judgments of experts on the inevitability of the Cuban missile crisis—a crisis that, as we saw in chapter 5, believers in the robustness of nuclear deterrence have difficulty imagining working out all that differently from how it did. In the control condition, experts fielded two questions that, on their face, look totally redundant. The inevitability curve question imposed a factual framing on the historical controversy over why the Cuban missile crisis ended as it did. It began by asking experts: At what point between October 16, 1962, and October 29, 1962, did some form of peaceful resolution of the crisis become inevitable (and thus deserve a probability of 1.0)? Then, after experts had specified their inevitability points, they estimated how the likelihood of some form of peaceful resolution waxed or waned during the preceding days of the crisis. The fourteen daily judgments, spanning October 16 to 29, defined the “inevitability” curve for each expert.

  The impossibility curve question is the logical mirror image of the inevitability curve question. It imposes a counterfactual framing on the historical controversy. It asks: At what point during the crisis, between October 16, 1962, and October 29, 1962, did all alternative, more violent endings of the Cuban missile crisis become impossible (and thus deserve to be assigned a subjective probability of zero)? After identifying their impossibility points, experts estimated how the likelihood of alternative, more violent endings waxed or waned during the preceding fourteen days of the crisis. These judgments defined the impossibility curve for each expert.

  Figure 7.7. Unpacking alternative, more violent endings of the Cuban missile crisis.

  In the “intensive unpacking” experimental condition, experts responded to the same questions with one key difference: the impossibility curve question now asked experts to judge the likelihood of alternative, more violent endings that had been decomposed into exhaustive and exclusive subsets. As Figure 7.7 shows, this set of counterfactual scenarios was initially decomposed into subsets with fewer than one hundred casualties or with one hundred or more casualties, that, in turn, were broken into sub-subsets in which violence was limited to the Caribbean or violence extended outside the Caribbean. Finally, all subsets with more than one hundred casualties were broken down still further into those scenarios in which only conventional weaponry was used and those in which nuclear weaponry was used. After presenting these possibilities, we asked experts to perform the same inevitability—and impossibility—curve exercises as in the control condition but to do so for each of the six subsets that appear at the bottom of figure 7.7.

  We did not expect experts to be blatantly inconsistent. Our working hypothesis was that, when experts completed the two measures back to back, their judgments of the retrospective likelihood of some form of peaceful outcome would mi
rror their judgments of the retrospective likelihood of alternative, more violent, outcomes. Logic and psychologic should coincide when experts can plainly see that the summed probabilities of x and its complement, ~x, are 1.0. But we did not expect logic and psychologic always to coincide. Factual framings of historical questions invite experts to search for potent forces that create an inexorable momentum toward the actual outcome. To answer this question, analysts must convince themselves that they know roughly when x had to happen. By contrast, counterfactual framings of historical questions invite analysts to look for causal candidates that have the potential to reroute events down different paths. And when we unpack the counterfactual possibilities into detailed sub-scenarios, the invitation is all the stronger. Accordingly, we expected anomalies in retrospective likelihood judgments, such as sub-additivity, when we compared the judgments of two groups of experts, one of which had completed the inevitability curve exercise first, and the other of which had completed the impossibility curve exercise first, but neither of whom had yet responded to the exercise just completed by the other.

 

‹ Prev