Expert Political Judgment

Page 37

by Philip E. Tetlock

Like Molière’s good doctor who discovered that he spoke prose, relativist critics may be astonished to discover that even they—as well as the experts they defend—have been speaking “probabilities” for most of their sentient existence on this planet. From roughly the age of five years onward, people begin deploying linguistic expressions for quantifying their uncertainty about both possible pasts and possible futures. Initially, the lexicon is impoverished: a “maybe” here or a “not sure” there. Over time, though, educated people become quite adept at distinguishing degrees of confidence in outcomes: “absolute certainty,” or “virtual certainty” or “probably,” or “slightly better than even odds” or “50/50,” or “a bit worse than even odds,” or “somewhat improbable,” or “quite unlikely,” or “bad bet,” or “only the remotest chance,” or “not a snowball’s chance in hell.” Such everyday expressions do not have precise probability referents, but people can, with moderate reliability, translate them into numerical probability estimates.9 And these implicit quantifiers play fundamental roles in life: they capture—albeit crudely—the strength of the underlying expectancies that guide decision making.

In a nutshell, the notion that people do not reason probabilistically—indeed, the notion they can avoid such reasoning in a stochastic world—is silly.

Moderate Neopositivist and Reasonable Relativist

Again, the debate has become overheated. Relativists are right that the question-framing and unpacking effects demonstrate the elusiveness of the Rankean goal of a theory-neutral data language in which we can tell history “as it really was.” Experimental manipulations of starting points—factual versus counterfactual—do leave an indelible imprint on the conclusions that we draw about what was or was not inevitable or impossible. But neopositivists see nothing odd or ironic about using the scientific method to detect sources of bias in inquiry. Objectivity may be unattainable in its pure Platonic form, but that does not mean we should stop trying to move in that direction (no more than we should stop trying to translate poetry). Identifying systematic biases in human cognition is as an integral part of the Enlightenment project of extending the reach of reason. The only way scientists can improve their measurement instruments—be they almost infallible atomic clocks or highly fallible human observers—is to be vigilant for sources of error.

The constructive question is neither relativist nor neopositivist in character. It is pragmatic: when are we better off translating our vague verbal conceptions of probability into a quantitative metric governed by restrictive conventions? If we leave things undisturbed, it will continue to be distressingly difficult to determine which of our intuitions are right or wrong, consistent or inconsistent. The correspondence and coherence benchmarks of rationality used in this book will remain beyond our measurement reach. But if we cross the qualitative-quantitative Rubicon, and get into the habit of affixing exact numbers where once there was only vague verbiage, we gain the opportunity to assess how well rival schools of thought “stack up” against each other as well as against fundamental standards of logical consistency and empirical accuracy. The comparisons will sometimes be uncomfortable, and there will be room for reasonable people to disagree over what the numbers mean. But we will have a framework for learning more about ourselves, about what we do well and what stands in need of correction.

IMPROVING THE QUALITY OF DEBATE IN POLICY CONTROVERSIES

I have made so many concessions in this project to moderate brands of relativism that I no longer know whether I am better pigeonholed as an “epistemologically liberal neopositivist” or an “epistemologically conservative relativist.”10 Whatever the correct classification, and it scarcely matters, the resulting hybrid position straddles a deep divide. On the one hand, it acknowledges an irreducible pluralism of perspectives on good judgment. There will always be wiggle room for arguing over who got it right. On the other hand, it directs us to hold rival perspectives accountable to fundamental tests of good judgment that, imperfect though they are, permit stark comparisons of relative performance.

This hybrid framework has guided us through seven chapters, but it does not tell us what to do next. The cautious scientific response would be to wait for the full peer-review verdict on this project and its follow-ups. We would then execute the Humean handoff: remind readers of David Hume’s famous fact-value distinction, declare we have reached the limits of science, and assign you, the readers, the role of ultimate arbiters of what use to put the evidence as you do your citizens’ duty of deciding who in the public arena does or does not have the right cognitive stuff, of deciding whether to risk being too tough (and punishing pundits for errors that should have been forgiven) or too lenient (and “forgiving” them for errors that deserved punishment). This division of intellectual labor might look like buck-passing, but it is rooted in the principled neopositivist conviction that scientists should not mix their roles as fact gatherers and analysts, where they have a comparative advantage, and their roles as policy advocates, where their opinions merit no greater weight than those of their fellow citizens.

We could end here. But my preference in the final section of this final chapter is to speak as a citizen, not a social scientist. I will provisionally assume the soundness of the approach to good judgment taken here and make the case that we as a society would be better off if participants in policy debates stated their beliefs in testable forms, monitored their forecasting performance, and honored reputational bets.

Making this case, however, is impossible without establishing how well off we are now: how effective are existing quality control mechanisms? The traditional answers—from liberal democratic theory—have been reassuring. We can count on open marketplaces of ideas to be self-correcting. Political partisans do not need to be naturally honest score-keepers of their predictive track records if they know that if they fail to rein in their self-promotional puffery, rival factions will pillory them as dogmatic dunces. We humans do not need to be perfect as long as we are flawed in offsetting ways. In the long term, market forces will winnow out the truth.

I have long resonated to classical liberal arguments that stress the efficacy of free-for-all exchanges in stimulating good ideas and screening out bad ones.11 But I now see many reasons why the routine checks and balances—in society at large as well as in the cloisters of academe—are not up to correcting the judgmental biases documented here. The marketplace of ideas, especially that for political prognostication, has at least three serious imperfections that permit lots of nonsense to persist for long stretches of time.12

First, vigorous competition among providers of intellectual products (off-the-shelf opinions) is not enough if the consumers are unmotivated to be discriminating judges of competing claims and counterclaims. This state of affairs most commonly arises when the mass public reacts to intellectuals peddling their wares on op-ed pages or in television studios, but it even arises in academia when harried, hyperspecialized faculty make rapid-fire assessments of scholars whose work is remote from their own. These consumers are rationally ignorant. They do not think it worth their while trying to gauge quality on their own. So, they rely on low-effort heuristics that prize attributes of alleged specialists, such as institutional affiliation, fame, and even physical attractiveness, that are weak predictors of epistemic quality. Indeed, our data—as well as other work—suggest that consumers, especially the emphatically self-confident hedgehogs among them, often rely on low-effort heuristics that are negative predictors of epistemic quality. Many share Harry Truman’s oft-quoted preference for one-armed advisers.13

Second, the marketplace of ideas can fail not because consumers are unmotivated but rather because consumers have the “wrong” motives. They may be less interested in the dispassionate pursuit of truth than they are in buttressing their prejudices. John Stuart Mill—who coined the marketplace of ideas metaphor—was keenly aware that audiences like listening to speakers who articulate shared views and blast opposing views more compellingly tha
n the audience could for itself. In his chronicle of the decline of public intellectuals, Richard Posner notes that these advocates specialize in providing “solidarity,” not “credence,” goods. The psychological function being served is not the pursuit of truth but rather enhancing the self-images and social identities of co-believers: “We right-minded folks want our side to prevail over those wrong-headed folks.” The psychology is that of the sports arena, not the seminar room. These observations also fit well with our data. Fans should find it much easier to identify with the brave hedgehogs who, unlike the equivocating foxes, do not back off in ideological combat and do not give the other side the satisfaction of savoring their mistakes. Even though we might disavow the sentiment if put too baldly, many of us seem to subscribe to the implicit theory that real leaders do not admit mistakes.14

Third, even granting that consumers are properly motivated, they can still fail because of either cognitive constraints or task difficulty constraints. Cognitive constraints are rooted in human nature: even smart people have limited mental capacity and make elementary errors of reasoning that are surprisingly resistant to correction by exhortations to “think harder.”15 Task difficulty constraints are rooted in the political environment: no matter how smart people are, it may be impossible to determine—even ex post—which pundits were closer to the truth. There are three intertwined reasons for suspecting this is true of many political controversies: (a) the propensity of partisans to make vague-to-the-point-of-oracular predictions that can be rendered consistent with a wide range of contradictory outcomes and thus never falsified; (b) the ingenuity of partisans, especially hedgehogs, in generating justifications and excuses whenever a respected member of their camp is so rash as to offer a prediction that can be falsified; (c) the inscrutability of the environment, which makes it easy for partisans to pick and choose convenient what-if scenarios in which the policies favored by one’s side always lead to better outcomes than would have otherwise occurred and the policies favored by the other side always lead to worse outcomes.

This combination of arguments gives ample reasons for fearing that the marketplace for political prognostication will be far from infallibly self-correcting. Figuring out our next move is not, however, easy. The prognostication market is not like that for goods or services in which the consumer can readily gauge or sellers guarantee quality (how often do pundits declare they will forswear punditry if they get it wrong). The market for political prognostication is also not like those for medical or legal services in which, although consumers cannot readily gauge quality, public anxieties about quality control (abetted by insiders’ less-than-altruistic desire to limit competition) have led to strict oversight of who can offer opinions. The First Amendment should override laws mandating that only state-licensed persons can voice opinions on public policy.

The obvious corrective to these market imperfections is a collective commitment to furnish public intellectual goods that make it easier to distinguish lower- from higher-quality political speculation. And academia is the obvious place to look for quality control guidance and precedents. Its elaborate systems of peer review represent a concerted effort to enforce norms of epistemic accountability that transcend allegiances to quarreling schools of thought. To obtain grants and to be published in scholarly journals—to get one’s voice heard in the marketplace of ideas among specialists—one must pass through a rigorous gauntlet of anonymous reviewers tasked with checking the soundness of one’s arguments.

The good news is that such systems do filter out a lot of “noise.” The bad news is that the key filtering mechanism—severely restricting access to publication outlets—is neither constitutionally feasible nor desirable in the broader marketplace of ideas. The added bad news is that existing journals in the social sciences are oriented around highly abstract theoretical controversies in which contributors virtually never stake their reputations to explicit predictions about the types of messy real-world outcomes so central to our forecasting exercises. These contributors are, moreover, right to be reticent given the game they are playing: the ceteris paribus requirement for theory testing can never be satisfied when so many uncontrolled variables are at work and so little is known about how those variables interact.

In this age of academic hyperspecialization, there is no reason for supposing that contributors to top journals—distinguished political scientists, area study specialists, economists, and so on—are any better than journalists or attentive readers of the New York Times in “reading” emerging situations. The data reported in chapters 2, 3, and 4 underscore this point. The analytical skills undergirding academic acclaim conferred no advantage in forecasting and belief-updating exercises. If these null-hypothesis results capture the true state of nature, it is not surprising there is so much disdain among high-ranking academics for forecasting exercises (the opposite of the attitude I would expect if they thought they held some advantage). One social science colleague told me with ill-concealed contempt: “We leave that for the media mavens.”

Caveats to the side, my own public-intellectual-goods proposal builds on the principle of rigorous review that prevails in top-ranked academic journals. These journals, like this project in miniature, are offspring of the Enlightenment quest to identify correspondence and coherence benchmarks for judging claims that transcend clashing schools of thought and establish criteria by which civilized people can agree to resolve disagreements—or at least agree on terms for disagreeing. To achieve legitimacy within a political or scholarly community, it is necessary for aspiring, public-intellectual-goods providers not only to maintain high evidentiary standards but also to honor basic norms of procedural fairness, including (a) equality of treatment so that representatives of opposing views perceive that the same epistemic ground rules apply to everyone; and (b) responsiveness to protests about the application of standardized rules in cases in which special circumstances allegedly arise.16

Unlike the precedent of academic journals, however, the proposal advanced here is not centered around evaluating the explanatory strengths and weaknesses of abstract theoretical accounts; the focus is on the capacity of flesh-and-blood observers, drawing on whatever mix of street smarts and academic knowledge they deem optimal, to decode real events unfolding in real time. To this end, observers would be subject to the same bottom-line correspondence and coherence tests of their judgments in this book. The only permissible deviations from standardization would be those necessary to assure participants from diverse viewpoints that the norms of procedural fairness are being respected. For example, at the beginning of the forecasting exercise, all participating observers would be given the option of specifying whether they wish to avail themselves of difficulty and value adjustments to their forecasting accuracy scores; at the end of the exercise, observers would be given the option of revising those adjustments as well as given the opportunity to accept additional modifications such as controversy adjustments (for residual uncertainty over what really happened) and fuzzy-set adjustments (for residual uncertainty over what nearly happened (close-call counterfactuals) or what might yet happen (off-on-timing). Observers could also opt either to keep their results private (and use the resulting feedback purely for cognitive self-improvement) or to go public (demonstrating their willingness to put their reputations on the line).

Observers would not, however, have infinite wiggle room for covering up mistakes. The performance feedback results would always include—in addition to whatever adjustments observers added—standardized baseline measures of forecasting accuracy and timeliness of belief updating that represent an ideologically balanced panel’s sense of what actually happened, with no moral or metaphysical second-guessing. Consumers of political prognostication could thus decide for themselves whether to rely on the objective forecasting accuracy and belief-updating performance statistics or to be more charitable and accept some or all of the adjustments to those scores proposed either ex ante or ex post by the forecasters themselves. Prudent consu
mers should become suspicious when they confront big gaps between objective performance indicators and subjectively adjusted indicators. Unadjusted ex ante forecasting performance tells consumers in the media, business, and government what most want to know: how good are these guys in telling us what will happen next? Ex post adjustments to forecasting accuracy tell us how good a job forecasters do, after the damage is done, in closing the gap between what they said would happen and what subsequently did happen.17

Of course, we have yet to confront the most daunting of all the barriers to implementation: the reluctance of professionals to participate. If one has carved out a comfortable living under the old regime of close-to-zero accountability for one’s pronouncements, one would have to be either exceptionally honest or masochistic to jeopardize so cozy an arrangement by voluntarily exposing one’s predictions to the rude shock of falsification. Powerful inertial forces keep the current system in place. However collectively suboptimal it may be, entrenched incumbents have a big interest in preserving it.

Big incentives will therefore be required to induce people to work through the sorts of arduous, frequently humbling, cognitive exercises that are the methodological backbone of this book—just as surely as big incentives have been necessary to induce people to surmount the formidable cognitive barriers to entry into every other prestigious profession in our society. To motivate would-be doctors and lawyers to acquire demonstrable competence, we have state licensing boards that aggressively pursue charlatans who offer medical and legal advice without passing through grueling professional hurdles. To motivate providers of other goods and services, society has instituted other protections, including laws on truth in advertising and fraud. But, again, none of these solutions applies here for obvious constitutional reasons.

‹ Prev Next ›