Expert Political Judgment

Home > Other > Expert Political Judgment > Page 40
Expert Political Judgment Page 40

by Philip E. Tetlock


  NATIONAL SECURITY AND DEFENSE POLICY

  Should we expect—over the next five or ten years—defense spending as a percentage of central government expenditure to rise, fall, or stay about the same? Should we expect policy changes over the next five to ten years with respect to military conscription, with respect to using military force (or supporting insurgencies) against states, with respect to participation in international peacekeeping operations (contributing personnel), with respect to entering or leaving alliances or perpetuation of status quo, and with respect to nuclear weapons (acquiring such weapons, continuing to try to obtain such weapons, abandoning programs to obtain such weapons or the weapons themselves)?

  SPECIAL-PURPOSE EXERCISES

  These eight exercises included (1) the weapons of mass destruction proliferation exercise (1988) in which forecasters judged the likelihood of twenty-five states acquiring capacity to produce weapons of mass destruction, nuclear or biological, in the next five, ten, or twenty-five years as well as the possibility of states—or subnational terrorist groups—using such weapons; (2) the Persian Gulf War I exercise (fall 1990) in which forecasters took positions on whether there would be a war (and, if so, how long it would last, how many Allied casualties there would be, whether Saddam Hussein would remain in power, and, if not, whether all or part of Kuwait would remain under Iraqi control); (3) the transitions from Communism exercise (1991–1992) that asked for predictions—over the next three, six, or twelve years—of both economic reform (rate of divesting state-owned enterprises; degree to which fiscal and monetary policy fit templates of “shock therapy”) and subsequent economic performance (unemployment, inflation, GDP growth); (4) an exercise on human-caused or -facilitated disasters in the next five, ten, or twenty-five years, including refugee flows, poverty, mass starvation, massacres, and epidemics (HIV prevalence) linked to inadequate public health measures (1992); (5) the EU exercises that focused initially on adoption of the euro (1992–2002, 1998–2008) but that later broadened to address the prospects of former Soviet bloc countries, plus Turkey, in meeting entry requirements); (6) the American presidential election exercises of 1992 and 2000 (Who will win? By how much?); (7) the Internet–New Economy exercise (1999) that focused on the overall performance of the NASDAQ (Is it a bubble? If so, when will it pop?) as well as the revenues, earnings, and share prices of selected “New Economy” firms, including Microsoft, CISCO, Oracle, IBM, HP, Dell, Compaq, Worldcom, Enron, AOL Time Warner, Amazon, and e-Bay; (8) the global-warming exercise that focused on CO2 emissions per capita (stemming from burning fossil fuels and manufacturing cement) of twenty-five states over the next twenty-five years, and on the prospects of states actually ratifying an international agreement (Kyoto Protocol) to regulate such emissions (1996–1997).

  Reality Checks

  We relied on the following reference sources to gauge the accuracy of forecasts on the following variables.

  CONTINUITY OF DOMESTIC POLITICAL LEADERSHIP

  We derived indicators of stability/change in individual leadership, dominance in legislative bodies, and character of regime from the CIA Factbook (www.odci.gov/cia/publications/factbook/index.html) as well as supplementary sources such as Facts on File. We derived measures of political liberalization from Freedom House: Freedom in the World (wysiwyg://11http://www.freedomhouse…/research/freeworld/2000/index.htm); the annual Amnesty International Report, London; the U.S. State Department’s Human Rights Reports released by the Bureau of Democracy, Human Rights, and Labor (http://www.state.gov/www/global/human_rights/hrp_reports_mainhp.html); and the United Nations’ World Economic and Social Survey (1997). We derived indicators of economic liberalization from James Gwartney, Randall Holcombe, and Robert Lawson, “Economic Freedom and the Environment for Economic Growth,” Journal of Institutional and Theoretical Economics, 155 (4) (December 1999): 1–21. We derived indicators of corruption from Transparency International, 2000 Corruption Perceptions Index, (http://transparency.de/documents/cpi/2000/cpi2000.html) and from PRS Group, International Country Risk Guide (various issues).

  GOVERNMENT POLICY AND ECONOMIC PERFORMANCE

  We derived data on GDP growth rates (purchasing power parity [PPP]), unemployment, and inflation from the World Bank, World Development Indicators 2000 (CD-ROM) as well as the Economist Intelligence Unit, various issues. We derived indicators of educational and health expenditures and, for a set of twenty wealthy countries, foreign aid, from the United Nations Development Project, Human Development Report 2000 (http://www.undp.org/hdr2000), as well as from the World Bank. We derived data on marginal tax rates from Price Water-house, Individual Taxes: A Worldwide Summary (various issues); OECD, Economic Surveys (various issues); and L. Bouten and M. Sumlinski, Trends in Private Investment in Developing Countries: Statistics for 1970–1995 (Washington, DC: World Bank, 1996). We derived miscellaneous data on the degree to which states have developed the institutional and legal infrastructure for market economies from the IMF, World Competitiveness Report 2000, as well as from the International Country Risk Guide (various issues). We derived key economic policy indicators—central bank interest rates, central government expenditure as a percentage of GDP (PPP), annual central government operating deficit as a percentage of GDP (PPP), and state-owned enterprises as a percentage of GDP (PPP)—from a combination of the World Bank (WDI’s CD-ROM, various editions), International Monetary Fund annual summaries, International Financial Statistics (various issues) and the Economist Intelligence Unit. We derived data on currency fluctuations against the U.S. dollar and stock market closes from the Economist Intelligence Unit and data on membership in trade pacts from the CIA Factbook. We derived indicators on CO2 emissions from the World Bank’s World Development Indicators.

  NATIONAL SECURITY AND DEFENSE POLICY

  We derived data on nuclear arms control outcomes, unilateral use of military force, participation in multilateral peacekeeping, defense spending (as a percentage of GDP), and entry into/status quo/exit from international alliances and security regimes from the CIA Factbook. We derived data on military conscription from the International Institute for Strategic Studies, The Military Balance (various issues). We derived data on human-caused disasters—famines, refugee flows, massacres—from United Nations, Global Humanitarian Emergencies (New York: UN/ECOSOC, 1997). We relied on the Minorities at Risk project (developed by Ted Gurr)—which monitors the status and conflicts among politically active communal groups in all countries with populations of at least 500,000—for assessments of whether interethnic bloodshed had waxed or waned in “trouble spots.”

  Coding Free-flowing Thoughts

  The analyses of thought protocols drew on two well-validated methods for quantifying properties of thinking styles that, if the cognitive-process account is right, should differentiate better from worse forecasters. The methods assessed: (a) evaluative differentiation—the number, direction, and balance of the thoughts that people generate in support of, or in opposition to, the claim that a particular possible future is likely; and (b) conceptual integration—the degree to which people make self-conscious efforts to resolve the contradictions in their assessments of the likelihood of possible futures.

  The thought coding for evaluative differentiation gauged the degree to which the stream of consciousness flows in one dominant direction. Coders counted the reasons—pro, neutral, or con—that forecasters generated for supposing that the possible future that they had judged most likely would materialize. Intercoder agreement ranged from .74 to .86. We then constructed a ratio for each respondent in which the numerator was the number of pro or con thoughts (whichever was greater) and the denominator was the total number of thoughts. Ratio balance indicators of 1.0 imply that thought is flowing in only one evaluative direction. Ratio balance indicators that approach or fall below 0.5 imply that thought is profoundly conflicted. At 0.5, every thought in support of the view that x will occur can be matched with a thought that either runs in the opposite direction or runs in no discernibl
e direction at all. Scores ranged from 0.39 to 1.0, with a mean of 0.74 and a standard deviation of 0.10. This meant that the average expert generated thoughts favoring his or her most likely scenarios by a ratio of roughly 3:1.

  The procedure for assessing conceptual integration drew on the widely used integrative complexity coding system.1 We singled out three indicators of the extent to which experts reflected on the problems of managing tensions between contradictory arguments:

  1. Do the forecasters consider each causal connection piecemeal (low integration) or does the forecaster think “systemically” about the connections among causes (high integration)? Systemic thinking acknowledges the possibility of ripple effects that slowly work through networks of mediating connections or of positive or negative feedback loops that permit reciprocal causation in which A causes B and B, in turn, causes A. Do forecasters try to capture the logic of strategic interdependence between key players in the political game? For example, forecasters could do so by analyzing the incentives for each player to pursue specific strategies given what others are doing, in the process drawing conclusions about whether they are observing games with single or multiple equilibrium “solutions” and about whether the equilibria are of the pure-strategy or mixed-strategy type. Forecasters could also do so by considering the possibility that players are constrained by the logic of two-level games in which moves that players make in one game (say, in international negotiations) must count as moves they make in a completely separate game with different competitors (say, in the domestic struggle for electoral advantage).

  2. Do forecasters acknowledge that decision makers have to grapple with trade-offs in which they must weigh core values against each other (high integration)? Do forecasters recognize that perspectives on what seems an acceptable trade-off might evolve and, if so, do they identify factors that might affect the course of that evolution?

  3. Do forecasters acknowledge that sensible people, not just fools or scoundrels, could view the same problem in clashing ways (high integration)? Do they explore—in a reasonably nonjudgmental way—the root cultural or ideological causes of the diverging perceptions of conflicting groups?

  As with the ratio balance indicator, we computed an index that took into account the total number of thoughts generated (and thus controlled for long-windedness). This procedure counted the integrative cognitions in the texts and divided by the total number of thoughts. Intercoder agreement ranged from .72 to .89. Integrative cognitions were quite rare, so the integrative-cognition ratios ranged from 0.0 to 0.21, with a mean of 0.11 and a standard deviation of 0.05. Only 16 percent of experts qualified as “integrative” by our scoring rules.

  Evaluative differentiation is necessary but not sufficient condition for integration, so it should be no surprise that integration was correlated with evaluative differentiation (r = 0.62). Given this correlation, it should also be less than astonishing to learn that the integration and ratio balance indicators (RBI) have the same profile of correlates. To simplify later mediational analyses, we (a) reversed the scoring of ratio balance so that now the higher the score, the more evaluatively differentiated (and less lopsided in favor of one position) the forecaster’s arguments (this just involved computing the value of (1 – RBI) for each respondent); (b) standardized both the revised ratio balance (1 – RBI) and integration-cognition indicators and added them to produce a composite indicator that will go by the name of integrative complexity.

  II. BAYESIAN BELIEF-UPDATING EXERCISES (CHAPTER 4)

  The goal here shifts from assessing “who got what right” to assessing how experts react to the apparent confirmation or disconfirmation of their forecasts.

  Respondents

  All participants in belief-updating exercises (n = 154) were drawn from subgroups that (a) participated in the forecasting exercises described in the regional forecasting exercises; and (b) qualified as “experts” in one of the eleven topics where belief-updating measures were obtained. These topics included the Soviet Union (1988), South Africa (1988), the Persian Gulf War of 1991, Canada (1992), Kazakhstan (1992), the U.S. presidential election of 1992, and the European Monetary Union (1992), as well as a different format in four other domains, including the European Monetary Union (1998), China, Japan, and India.

  Research Procedures and Materials

  EX ANTE ASSESSMENTS

  Respondents were told: “We want to explore in greater depth the views of subject matter experts on the underlying forces—political, economic, cultural, etc.–that are likely to shape the future of [x].” We then posed variants of the following questions:

  a. “How confident are you in the correctness of your assessment of the underlying forces shaping events in [x]?” (Respond to 0 to 1.0 likelihood scale, anchored at 1.0 [completely confident that point of view is correct], at 0.5 [50/50 chance], and at 0 [completely confident that point of view is wrong]).

  b. “In many domains we study, experts often feel there are other schools of thought that need to be taken into account. Think of the most influential alternative view to the one you currently hold. How likely do you feel it is that this alternative position might be correct?” Experts were asked to make sure that the likelihood assigned to questions (a) and (b) summed to 1.0. Experts who felt there was more than one major alternative view were asked to assign likelihoods to these alternative views being correct and to make conditional probability judgments (described later) for these views as well.

  c. “Assuming your assessment of the underlying forces shaping events in [x] is correct and continues to be correct, please try to rank order the following scenarios in order of likelihood of occurrence (from most to least likely). If you feel that you just cannot distinguish the likelihood of two or more scenarios, feel free to specify a tie.” After the initial rank ordering, respondents then assigned a subjective probability to each scenario. The scenarios were designed to be exclusive and exhaustive, so experts were asked to ensure that the subjective probabilities they assigned to each scenario summed to 1.0. Respondents were told that if they felt an important “possible future” had been left off the list of scenarios, they should feel free to insert it (few did so, and the requirement of summation of subjective probabilities to unity remained). Respondents were reminded that the subjective probabilities assigned to the option chosen as most likely should always be equal to, or greater than, the point on the likelihood scaled labeled as maximum uncertainty (in the two-scenario case, 0.5; in the three-scenario case, 0.33; and so forth). The instructions also stressed that “it is perfectly acceptable to assign guessing confidence to each scenario if you feel you have absolutely no basis for concluding that one outcome is any more likely than another.”

  d. “For sake of argument, assume now that your understanding of the underlying forces shaping events is wrong and that the most influential alternative view in your professional community is correct.” Experts then repeated the tasks in case (c).

  e. “Taking all the judgments you have just made into consideration, what is your best bottom-line probability estimate for each possible future?”

  In the Soviet case, possible futures included a strengthening, a reduction, or no change in community party control; for South Africa, movement toward more repressive white minority control, continuation of the status quo, and major movement toward black majority rule; for Kazakhstan, a decline, no change, or an increase in interethnic violence; for Canada, the formal secession of Quebec, continuation of the constitutional status quo, or a new successful effort (commanding the assent of all ten provinces and the federal government) to work out an intermediate “special status” solution of autonomy within confederation; for the European Monetary Union, abandonment of the goal of a common currency, serious delays (in the order of several years, with several major countries “falling out” to varying degrees) or successful progress toward the goal exactly on or close to schedule; for the Persian Gulf crisis, war (brief or protracted), or no war (which could take the form of negotiated
compromise or continued confrontation); for the U.S. presidential election of 1992 (George H. W. Bush vs. Bill Clinton vs. Ross Perot) and of 2000 (George W. Bush vs. Al Gore).

  An alternative procedure for eliciting reputational bets depersonalized the process so that experts were no longer pitted against their rivals. For example, in addition to asking experts on Western Europe in 1998 to judge the likelihood of countries adopting the euro in the next three to five years, we asked them to judge the truth or falsity of the hypothesis that “there is a long-term process of economic and political integration at work in Europe” and then make two sets of conditional-likelihood judgments: (a) assume the hypothesis is definitely (100%) true and then judge the likelihood of countries adopting the euro in three to five years; (b) assume the opposite and make the same likelihood judgements.

  EX POST ASSESSMENTS

  After the specified forecasting interval had elapsed, we recontacted the original forecasters (reinterview rate between 61 percent and 90 percent depending on exercise—average of 71 percent). In six regional forecasting exercises, we first assessed experts’ ability to recall their original answers (data used in the six hindsight studies reported in chapter 4) and then diplomatically reminded them of their original forecasts and presented them with a Taking Stock Questionnaire that posed nine questions to which experts responded on nine-point scales, anchored at 1 by “strongly disagree” and 9 by “strongly agree,” with 5 anchored as “completely unsure.” In the other five exercises, we simply reminded experts of their forecasts (“our records indicate …”) and went directly to the Taking Stock Questionnaire, which invited experts to look back on their original forecasts and on what had subsequently happened and to rate their agreement or disagreement with the following propositions:

 

‹ Prev