Expert Political Judgment

Page 14

by Philip E. Tetlock

The Hot Air Hypothesis

Radical skeptics have been dealt a setback, but that should not prevent us from testing their remaining hypotheses. We have seen that forecasters pay a steep calibration price for attaching high likelihoods to low-frequency events. The hot air hypothesis asserts that experts are more susceptible to this form of overconfidence than dilettantes because experts “know too much”: they have so much case-specific knowledge at their fingertips, and are so skilled at marshalling that knowledge to construct compelling cause-effect scenarios, that they talk themselves into assigning extreme probabilities that stray further from the objective base-rate probabilities. As expertise rises, we should therefore expect confidence in forecasts to rise faster, far faster, than forecast accuracy.

We should, though, be careful. Regression-toward-the-mean effects can mimic overconfidence. If we make the safe assumption there is measurement error in our indicators of subjective probability and objective frequency, we should expect purely by chance that when experts assign extremely confident probabilities to outcomes—say zero (impossible) or 1.0 (sure thing)—those outcomes will materialize more than 0 percent and less than 100 percent of the time (exactly how much more or less is a function of the correlation between subjective and objective probabilities and the size of the standard deviations—see Technical Appendix). The key question is whether the magnitude of the observed overconfidence—either that things will (1.0) or will not happen (zero)—exceeds what we would expect based on chance.

To answer this question, we need to determine whether the average probability judgment is significantly different from the average objective frequency for each of the three superordinate classes of events: no change versus change for either the better or worse. Consistent with the systematic error hypothesis, a series of t-tests reveal significant probability-reality gaps for both experts and dilettantes. Both groups assign too high probabilities to change, especially change for the worse, and too low probabilities to the status quo. And consistent with the hot air hypothesis, we find bigger probability-reality gaps for experts than for dilettantes. Experts pay a penalty for using the extreme end points of the probability scale more often: pronouncing outcomes either impossible (zero) or almost impossible (.1) and either inevitable (1.0) or almost inevitable (.9). Of all the predictions experts made, 30.3 percent declared outcomes either impossible or highly unlikely and 6.9 percent declared outcomes either certain or highly likely. By contrast, dilettantes were more diffident, assigning only 25.8 percent of their predictions to the lowest probabilities and 4.7 percent to the highest. Given we already know that experts and dilettantes are roughly equally adept at assigning probabilities (the impossible or nearly impossible happen about 15 percent of the time for both groups and sure things or almost sure things fail to happen about 27 percent of the time for both groups), the arithmetic dictates that experts have more big mistakes to explain away than do dilettantes.

Assessing the core claim of the hot air hypothesis—that one-sided justifications are pumping up overconfidence among experts—requires moving beyond ticks on probability scales and coding forecasters’ rationales for their predictions. As described in the Methodological Appendix, we asked all participants to provide explanations for their forecasts on one topic within their field of expertise and on one topic outside their field. These thought protocols were then subjected to content analyses that assessed, among other things, verbosity, number of arguments favoring each possible future, and the balance of arguments favoring or opposing the most likely future. The results confirmed that: (a) the more relevant expertise forecasters possessed, the more elaborate their justifications for their forecasts. The average number of causal arguments advanced for “most likely futures” in experts’ thought protocols was 5.4, whereas the average number for dilettantes was 2.9, a highly significant difference; (b) the more lopsidedly these arguments favored the most likely future (the ratio of pro to con arguments), the higher the perceived likelihood of the future (r = .45).

Clinching the argument requires, however, demonstrating the mediating role of thoughts in producing differential confidence among experts and dilettantes. Here, regression analysis revealed that the tendency for experts to make more extreme predictions than dilettantes vanishes when we control statistically for the links between expertise and loquacity (number of arguments) and extremity and loquacity. The expertise-extremity relationship also disappears when we control for the lopsidedness of the argument count in favor of the most likely future. In both cases, the relevant partial correlations fell below .10.

The Seduction Hypothesis

The final tenets of radical skepticism, the fifth and sixth, deal not with the accuracy of expert advice but rather with the forces that drive supply and demand for such advice. The fifth tenet declares that the expert community has too great a vested interest in self-promotion to cease and desist from supplying snake oil forecasting products. If this sounds a tad harsh, it is meant to be. Hard-core skeptics do not mince words when it comes to appraising the gullibility of their fellow citizens, the forecasting skills of their colleagues, and the psychological obstacles that make it difficult both for citizens to recognize that the imperial expert has no clothes and for experts to acknowledge the naked truth.

Inspection of correlation coefficients culled from the forecasting exercises illustrates how supply-side processes may prime the pump of overconfident advice that flows from well-placed epistemic communities into the policy world. We asked a large group of participants how often they advised policy makers, consulted with government or business, and were solicited by the media for interviews. Consistent with the seduction-by-fame-fortune-and-power hypothesis, experts in demand were more overconfident than their colleagues who eked out existences far from the limelight, r (136) = .33, p < .05. A similar correlation links overconfidence and the number of media mentions that participants received, according to a Google search count (r = .26). Both relationships fell to nonsignificance after controlling for a cognitive-style measure derived from the thought protocols. More “balanced” thinkers (who were prone to frame arguments in “on the one hand” and “on the other” terms) were less overconfident (r = .37) and less in the limelight (r = .28). Of course, causality surely flows in both directions. On one hand, overconfident experts may be more quotable and attract more media attention. On the other, overconfident experts may also be more likely to seek out the attention. The three principals—authoritative-sounding experts, the ratings-conscious media, and the attentive public—may thus be locked in a symbiotic triangle. It is tempting to say they need each other too much to terminate a relationship merely because it is based on an illusion. We return to these issues in chapter 8, where we take up Richard Posner’s thesis that there are systematic distortions in the media markets for intellectual commentary.

The Indefinitely Sustainable Illusion Hypothesis

This final hypothesis declares that, no matter how unequivocal the evidence that experts cannot outpredict chimps or extrapolation algorithms, we should expect business to unfold as usual: pundits will continue to warn us on talk shows and op-ed pages of what will happen unless we dutifully follow their policy prescriptions. We—the consumers of expert pronouncements—are in thrall to experts for the same reasons that our ancestors submitted to shamans and oracles: our uncontrollable need to believe in a controllable world and our flawed understanding of the laws of chance. We lack the willpower and good sense to resist the snake oil products on offer. Who wants to believe that, on the big questions, we could do as well tossing a coin as by consulting accredited experts?

A simple experiment on Berkeley undergraduates illustrates the demand side of the equation. We presented one of two fictitious scenarios in sub-Saharan Africa: either a low-stakes decision in which the worst-case outcome was a minor escalation of tensions between two ethnic groups inhabiting the same country or a high-stakes decision in which the worst-case outcome was a bloodbath with many thousands of deaths.
Students then judged how much confidence they would have in policy advice from one of two sources: a panel of prestigious social scientists drawn from major universities but lacking specialized knowledge of the region and an otherwise identical panel that did possess specialized knowledge of the region. The will to believe in the predictive and prescriptive powers of expertise should be strongest when observers believe that lives hang in the balance. Consistent with this notion, increasing the stakes boosted the perceived likelihood of only the relevant experts being right (from 65 percent to 79 percent), not that of the irrelevant ones (54 percent to 52 percent). Responsible people reach for the best possible advice; to do anything else would be, well, irresponsible. The mystique of expertise is so rooted in our culture that failing to consult the right experts is as unconscionable a lapse of due diligence as failing to consult witch doctors or Delphic oracles in other times.

The experiment also sheds light on how political accountability can foster reliance on expertise. We asked participants to imagine that government officials made a decision that worked out either badly or well after they had consulted either relevant or less clearly relevant experts. When people imagined a policy failure, they attached greater responsibility to officials who failed to consult the right experts. To be sure, going through the right process motions does not immunize one from criticism: consulting the right people and failing is still not as good as succeeding, regardless of whom one consults. But it takes some sting out of the criticism.

Here, then, is a reason for skeptics to despair. Even if they win all the scientific battles, they may still lose the political war. Defeat is likely, in part, because it will be hard to convince those with their hands on the levers of power to accept the unsettling implications of the skeptics’ warnings. Defeat is also likely, in part, because, even if skeptics overcome this formidable psychological barrier, they face an even more formidable societal one. Given prevailing accountability norms and practices, even decision makers who believe the skeptics are right should continue soliciting advice from the usual suspects. They know that the anticipated blame from a policy fiasco in which they bypassed the relevant experts substantially exceeds that from a fiasco in which they ritualistically consulted the relevant experts.

GROPING TOWARD COMPROMISE: SKEPTICAL MELIORISM

The radical skeptics’ assault on the ivory-tower citadels of expertise inflicted significant, if not irreparable, reputational damage. Most experts found it ego deflating to be revealed as having no more forecasting skill than dilettantes and less skill than simple extrapolation algorithms. These are not results one wants disseminated if one’s goal is media acclaim or lucrative consulting contracts, or even just a bit of deference from colleagues to one’s cocktail-party kibitzing on current events. Too many people will have the same reaction as one take-no-prisoners skeptic: “Insisting on anonymity was the only farsighted thing those characters did.”50

There is, then, a case for closing the scientific case and admonishing pundits to bring their inflated self-concepts into alignment with their modest achievements. But the case for cloture also has weaknesses. Several pockets of evidence suggest the meliorist search for correlates of good judgment is not as quixotic as radical skeptics portray it. There are numerous hints that crude human-versus-mindless-algorithm or expert-versus-dilettante comparisons are masking systematic individual differences in forecasting skill. The meliorists may be right that good judgment is not reducible to good luck.

The die-hard skeptics will resist. They see no point in taking what meliorists consider the natural next step: moving beyond generalizations about forecasters as a whole and exploring variation in forecasting skill that permits us to answer more subtle questions of the form “Who was right about what, when, and why?” There is no point because we now know that variation in forecasting skill is roughly normally distributed, with means hovering not much above chance and slightly below case-specific extrapolation algorithms. Would we not expect exactly these patterns if experts on average had almost no forecasting skill, but some lucky souls got higher scores and others lower ones? To be sure, if one looks long enough, one will find something that correlates with something else. But one will have frittered away resources in pursuit of will-o’-the-wisp relationships that will fail to hold up in new periods just as surely as will Aunt Mildred’s astrological guide that served her so well at roulette last week. Truth in advertising requires presenting any search beyond chapter 2 not as one for correlates of ability to make stunningly accurate forecasts but rather as one for correlates of ability to avoid the massive mistakes that drive the forecasting skill of certain groups deep into negative territory, below what one could achieve by relying on base rates or predict-the-past algorithms.

It would be rash to ignore the skeptics’ warnings. But it would also be rash to ignore the strong hints that certain brands of meliorism have validity. Uninspiring though absolute levels of forecasting skill have been—relative differences in skill are consequential. Experts who speak truth to power are not about to be replaced by extrapolation algorithms. And, among these experts, there is no shortage of competing views on the conceptual recipe for good judgment. Proponents of these views regularly dispense their conflicting advice in magazines, in conferences, and on television talk shows—advice that has rhetorical force only insofar as the audience grants that the source knows something about the future that the source’s sparring partners do not. Tax cuts will either promote or impede or have no effect on GDP growth; pursuing ballistic missile defense projects will have either good, bad, or mixed effects. The current approach holds out the promise of determining which perspectives are linked to more accurate predictions. Keeping a rough tally of who gets what right could serve the same public-service functions as the tallying exercises of the Wall Street Journal or the Economist serve in the domains of equity markets and macroeconomics: provide reality checks on self-promotional puffery and aid to consumers in judging the quality of competing vendors in the marketplace of ideas.

Finally, regardless of whether it is rash to abandon the meliorist search for the Holy Grail of good judgment, most of us feel it is. When we weigh the perils of Type I errors (seeking correlates of good judgment that will prove ephemeral) against those of Type II errors (failing to discover durable correlates with lasting value), it does not feel like a close call. We would rather risk anointing lucky fools over ignoring wise counsel. Radical skepticism is too bitter a doctrinal pill for most of us to swallow.

1 For thoughtful postmortems, see L. M. Bartels and J. Zaller, “Presidential Vote Models: A Recount,” Political Science and Politics 34 (2001): 9–20; M. S. Lewis-Beck, and C. Tien, “Modeling the Future: Lessons from the Gore Forecast,” Political Science and Politics 34 (2001): 21–24; C. Wlezien, “On Forecasting the Presidential Vote,” Political Science and Politics 34 (2001): 25–32; J. E. Campbell, “The Referendum That Didn’t Happen: The Forecasts of the 2000 Presidential Election,” Political Science and Politics 34 (2001): 33–38.

2 S. J. Gould, Bully for Brontosaurus: Reflections in Natural History (New York: W. W. Norton, 1991).

3 H. F. Pitkin, Fortune Is a Woman: Gender and Politics in the Thought of Niccolo Machiavelli (Berkeley: University of California Press, 1984).

4 Martin Gilbert, Churchill: A Life (New York: Holt, 1991).

5 G. Gorodetsky, Grand Delusion (New Haven, CT: Yale University Press, 1999).

6 G. Soros, Open Society: Reforming Global Capitalism (New York: Public Affairs, 2000).

7 Gilbert, Churchill.

8 B. Arthur, Increasing Returns and Path-Dependence in the Economy (Ann Arbor: University of Michigan Press, 1994); P. Pierson, “Increasing Returns, Path Dependence, and the Study of Politics,” American Political Science Review 94 (2000): 251–67.

9 Arthur, Increasing Returns, 112–14.

10 Douglas C. North, Institutions, Institutional Change and Economic Performance (Cambridge: Cambridge University Press, 1990); D. C. North and R. P. Thomas,
The Rise of the Western World: A New Economic History (New York: Cambridge University Press, 1973).

11 Jack Goldstone, “Europe’s Peculiar Path: The Unlikely Transition to Modernity,” in Unmaking the West: What-If Scenarios That Rewrite World History, ed. P. E. Tetlock, R. N. Lebow, and G. Parker (Ann Arbor: University of Michigan Press, 2006).

12 J. Mokyr, “King Kong and Cold Fusion: Entities That Never Were but Could Have Been,” in Tetlock, Lebow, and Parker, Unmaking the West.

13 D. Yergin, The Prize: The Epic Quest for Oil, Money, and Power (New York: Simon & Schuster, 1991).

14 J. Gleick, Chaos: Making a New Science (New York: Viking, 1987).

15 P. Bak and K. Chen, “Self-Organized Criticality,” Scientific American 264 (January 1991): 46–53.

16 D. McCloskey, “History, Differential Equations, and the Problem of Narration,” History and Theory 30 (1991): 21–36.

17 See R. Cowley, What If? The World’s Foremost Military Historians Imagine What Might Have Been: Essays (New York: Putnam, 1999); Tetlock, Lebow, and Parker, Unmaking the West.

18 Robert Fogel, Without Consent or Contract: The Rise and Fall of American Slavery (Boston: Houghton Mifflin, 1989).

‹ Prev Next ›