PART B: LOGICAL-COHERENCE AND PROCESS INDICATORS OF GOOD JUDGMENT
We shift here to tests of good judgment that focus not on the empirical accuracy of judgments but rather on the logical defensibility of those judgments. Among other things, good judges should respect each of the following formal principles of probability theory:
a. The additive rule that defines the probability of either of two mutually exclusive events occurring: p(A ∪ B) = p(A) + p(B). If we also stipulate exhaustiveness: p(A ∪ B) = p(A) + p(B) = 1. Subsections I and II of part B, which deal with the power of belief unpacking to inflate subjective probabilities and to warp inevitability and impossibility curves, describe violations of this rule.
b. The multiplicative rule that defines the joint probability of two independent events: p(A ∩ B) = p(A)p(B). More generally, if we allow for the possibility the events are not independent: p(A ∩ B) = p(A/B)p(B) = p(B/A)p(A). Subsection III, which deals with the power of scenarios to inflate subjective probabilities, describes violations of this rule.
c. Bayes’s theorem builds on these identities to define the probability of an outcome (D) conditional on alternative exclusive and exhaustive hypotheses (H and ~H) being true: p(D) = p(D/H)p(H) + p(D/~H)p(~H). Subsection IV, which deals with egocentricity gaps between the likelihoods people actually attach to outcomes and those they would if they used the full formula, describes violations of this rule.
d. Bayes’s theorem further builds on these identities to define the probability of a hypothesis (H) conditional on an outcome (D) as:
In odds form, Bayes’s theorem tells us how much confidence people should retain in the relative validity of two hypotheses once they learn outcome D occurred. Their confidence should be a function of the prior odds they placed on each hypothesis being true (the ratio p(H)/p(~H)) and a function of the conditional likelihoods they placed on the outcome assuming the truth of each hypothesis (the likelihood ratio or “reputational bet”
Subsection V, which deals with failures of belief updating (failures to honor reputational bets), describes violations of this rule.
I. Violations of the Additive Rule: Belief-unpacking Effects (Chapter 7)
Amos Tversky’s support theory predicts when people will quite routinely violate the additivity rule by making “sub-additive” judgments.5 Support theory posits that the judged likelihood of a hypothesis A is a monotonic function of the strength of the arguments, s(A), that people can muster for the hypothesis. The judged probability that hypothesis A rather than B holds, P(A, B), assuming only one can be true, is
The theory also posits that unpacking the description of an event A (e.g., victory in a baseball game) into its disjoint components, A1 ∪ A2 (e.g., victory by one run or victory by more than one run), generally increases its support and that the sum of the support linked to component hypotheses must be at least as large as the support for their explicit disjunction, so that
assuming that (A1, A2) is a partition of A. The psychological rationale is that unpacking reminds us of possibilities, and of evidence for possibilities, that we otherwise would have overlooked. The judged likelihood of the “whole set” can thus often be less than the sum of its parts. Unpacking can magnify the sorts of anomalies portrayed in figure A.6.
II. Further Violations of Additivity: Analytical Framework for Impossibility and Inevitability Curves
Figures 7.8 and 7.9 showed the impact on probability judgments of encouraging observers to unpack counterfactual possibilities into progressively more specific (easily imagined) sub-possibilities. The area between the two impossibility curves represented the power of unpacking to inflate likelihood judgments across time.
In figures 7.8 and 7.9, the data for the impossibility curves with and without unpacking consisted of points (xi, yi) where the xi’s were dates and the yi’s were subjective probabilities. The impossibility curve with no unpacking shows that probability judgments of those counterfactual alternatives to reality can be best approximated as a lower-order polynomial function of time:
Figure A.6. The probabilities of three exclusive and exhaustive possibilities on its x-axis: A(.1), B(.1), and C(.8). It illustrates how sub-additive judgments result from probability-weighting functions that incorporate the principle of diminishing sensitivity in which the decision weight assigned to events falls off sharply with departures from the natural reference-point boundaries of zero and 1.0. For instance, A’s probability of .1 translates into the decision weight w(.1); B also has a probability of .1, which would receive the decision weight w(.1) if judged by itself but of only w(.2)–w(.1) when judged as an addition to A. Thus when we compute the decision weight for the likelihood of either A or B (probability of .2), the resulting w(.2) is far smaller than w(.1)+w(.1). In a similar vein, C has a probability of .8, which translates into the decision weight w(.8) when we judge it by itself. But when we compute the decision weight for the likelihood of either C or A, or C or B (probability of .9), the value w(.9) is far smaller than w(.8)+w(.1).
The impossibility curves with unpacking of counterfactual alternatives to reality can be best approximated as a higher-order polynomial function of time:
Having approximated the two separate functions, we can now calculate the area of the region by integrating both functions:
To obtain the area of the shaded region, we simply compute the difference of the two areas.
III. Violations of the Multiplicative Rule: Scenario Effects (Chapter 7)
Whenever we construct scenarios with contingent linkages between events (A probably leads to B, which probably leads to C), we should be alert to the possibility that people are not adequately taking into account the rapidity with which likelihood of the “whole” diminishes.
Figure A.7 illustrates the number of logical constraints that people would need to satisfy if they were to judge the Canadian-futures scenarios in a logically consistent manner. Of course, as chapter 7 showed, people often flouted these constraints by assigning far more probability to the lower-level branches than they should have. For example, the probability of the lowest and leftmost node, A1B1C1, should equal P(A1)P(B1)P(C1) if A, B, and C are independent. Even if each event had a probability of .7, the joint likelihood of all three is .73 = .343. Probabilities fall less steeply insofar as A, B, and C are correlated, but usually still more rapidly than do forecasters’ probability estimates of multiple-outcome possibilities. The net result is thus also violations of the additivity rule: in particular, big sub-additivity effects in which, for example:
IV. Violations of the Definitions of the Probability of an Event (Chapter 4)
The violations of the definition of the probability of an event in the previous equation arose because observers often estimated the p(A) to be essentially the same as p(A/B), where A refers to experts’ most likely futures and B refers to their favorite working hypotheses about the underlying drivers of events. The result was an egocentricity gap in which observers wound up assigning significantly more likelihood to their “most likely futures” (p(A)) than they would have if they had taken other relevant factors into account (taking seriously the possibility that their favorite hypothesis, p(B), might be false).
Figure A.7. The complexity of the probabilistic reasoning in which people must engage, and the logical constraints they must respect, to arrive at logically consistent probability estimates for both the bottom-line prediction in the Canadian futures exercise in chapter 7 (will Canada hold together [A1] or fall apart [A2]?) and the various subsidiary possibilities (economic upturn [B1] or downturn [B2] as well as the outcome of the next Quebec election (separatists win [C1] or lose [C2]). For example, P(C1 | B2, A1)P(B2 | A1)P(A1)+P(C2 | B2, A1)P(B2 | A1)P(A1) must equal P(B2 | A1)P(A1) and, in turn, P(B2 | A1)P(A1)+P(B1 | A1)P(A1) must equal P(A1), and of course P(A1) + P(A2) must equal 1.0.
Figure A.8. How the egocentricity gap, produced by estimating p(D) solely from p(D/H), has the theoretical potential to grow as a function of the extremity of the likelihood ratio (all possib
le combinations of numerator values of .5, .7, and .9 and denominator values of .1, .3, and .5, yielding ratios from 1:1 to 9:1) and the cautiousness of the prior-odds ratio (values for p(H) ranging from .5 to .7 to .9, yielding odds ratios from 1:1 to 9:1). There is the greatest potential for large gaps when experts offer extreme likelihood-ratio judgments and cautious prior-odds ratio judgments. The maximum potential sizes of the egocentricity gap for hedgehogs and foxes are roughly equal; the actual sizes of the egocentricity gaps reveal foxes to be less susceptible to the effect than hedgehogs.
The sloping diagonal lines in Figure A.8 highlight two factors that moderate the magnitude of the gap: the extremity of two ratios, the prior-odds ratio (experts’ subjective probability estimates of their preferred hypothesis about the underlying drivers of events proving correct, divided by their estimates of the most influential alternative hypothesis proving correct) and the likelihood ratio (experts’ subjective probabilities that their most likely future will occur if their preferred hypothesis proves correct, divided by their subjective probability of that future occurring if the rival hypothesis proves correct).
Figure A.8 also shows: (a) there is greater potential for large egocentricity gaps when the likelihood ratio, p(D/H)p(D/~H), rises above 1 (reflecting increasingly confident reputational bets). This mathematical necessity should lead us to expect wider gaps among the more intellectually aggressive hedgehogs than among the more cautious foxes (likelihood ratios for foxes’ most likely futures hovered around 2.3:1, whereas hedgehogs’ ratios hovered around 3.2:1); (b) the potential size of egocentricity gaps shrinks as the prior odds ratio, p(H)/p(~H), rises from .5 to 1.0, reflecting increasing confidence in the correctness of one’s worldview. This mathematical necessity should lead us to expect narrower gaps among the self-assured hedgehogs than among the more tentative foxes (prior-odds ratio for foxes hovered around 2.2:1, whereas those for hedgehogs hovered around 3.1:1); (c) the offsetting effects in (a) and (b), coupled to the actual differences in likelihood ratios and prior-odds ratios offered by foxes and hedgehogs, imply that if foxes and hedgehogs were equally likely to rely on the p(D/H) as a heuristic for estimating the value of p(D), the actual egocentricity gap would be slightly greater for foxes than for hedgehogs. Figure A.8 shows that: (a) the predicted values of the egocentricity gaps for hedgehogs and foxes are roughly equal (-.16 versus -.149); (b) the actual values of the gap are substantially larger for hedgehogs (-.12) than for foxes (-.07). This result is captured by the steeper rise in the fox circles than in the hedgehog triangles and is consistent with the hypothesis that foxes are less likely to rely on the “consider only my own perspective” heuristic in affixing likelihoods to those futures they judge most likely to occur.
If we replaced experts’ actual predictions in all the regional forecasting exercises in section I with those they would have made if they had shown zero egocentricity gaps in the preliminaries to the belief-updating exercises in section II, there would have been considerable shrinkage in the subjective probability–objective reality gaps, with less overestimation of the likelihood of “most likely futures.” Foxes’ probability-reality gaps would shrink by approximately 18 percent and hedgehogs’ gaps would shrink by approximately 32 percent. The fox-hedgehog performance differential would obviously also shrink, but it would remain statistically significant.
V. Violations of Belief-updating Rule
We relied on reputational bets (likelihood ratios) elicited at time 1 to assess how strong a Bayesian obligation experts would feel to change their minds at time 2 when they learn whether their most likely future did or did not materialize.
The Bayesian belief-updating formula in chapter 4 tells us how much one should change one’s mind about the validity of competing hypotheses when one confronts evidence that one once thought had probative value for distinguishing those hypotheses. Figure A.9 illustrates how much a card-carrying Bayesian should increase or decrease confidence in a hypothesis when the incoming evidence is either moderately or strongly diagnostic (likelihood ratio equaling 0.6/0.4 in the first case and 0.8/0.2 in the second and .95/.05 in the third). The curves rise much more slowly in response to weaker evidence (likelihood ratio closer to 1.0), and there is more room for rapid upward movement for hypotheses that start from a low baseline prior (say, .1) than from a high one (say, .9 where ceiling effects limit potential change).
Extreme probability assignments, such as assigning 1.0 or total confidence to one’s prior hypothesis and zero or no confidence to rival perspectives, create problems within this framework. Key terms, such as the prior-odds ratio, become undefined when forecasters declare unshakeable confidence. When such problems arose, we used replacement value of .95/.05.
There is a straightforward procedure for computing the discrepancy between how much experts update their beliefs and how much Bayes’s theorem says they should. The Bayesian prescription for belief change can be obtained in three steps. First, calculate the ex ante likelihood ratio, which is done by dividing the expert’s original assessment of the conditional likelihood of each scenario, assuming the correctness of that expert’s understanding of underlying forces, by the expert’s original assessment of the conditional likelihood of the same scenario, but now assuming the correctness of the most influential alternative view of the underlying forces. Second, calculate the prior-odds ratio, which is done by dividing the subjective probability that experts placed in their understanding of the underlying forces by the subjective probability that experts placed in the most influential rival view of those underlying forces. And third, multiply the prior-odds ratio by the diagnosticity ratio for each respondent’s forecasts to yield the posterior-odds ratio, which tells us the relative likelihood of the two hypotheses in light of what we now know has happened.
Just as fuzzy-set adjustments can “correct” probability scores by giving forecasters credit for being almost right, the same can be done for belief-updating equations. In the latter case, though, the correction operates on the likelihood ratio. We might, for example, allow losers of a reputational bet to lower the likelihood ratio (bring it closer to unity so that the outcome of the bet has weaker implications for the correctness of any point of view) in proportion to the frequency with which forecasters offered belief system defenses, in proportion to the credibility weights one assigns the defenses, and in proportion to how self-servingly forecasters offered the defenses. Imagine forecasters have just lost a reputational bet: an outcome has just occurred that they earlier rated as three times more likely if their rivals’ point of view was correct than if their own point of view was correct. This 3:1 ratio could be reduced in proportion to the frequency with which experts claimed other things almost happened or might still happen. Thus a 3:1 ratio could be cut in half if forecasters offered such defenses 50 percent of the time and we grant 100 percent credibility to those claims. But this halving of the original likelihood ratio can be quickly reversed if we adjust the fuzzy-set correction in proportion to experts’ tendency to offer more “excuses” when they are on the losing versus winning side of reputational bets. If experts “request” fuzzy-set adjustments nine times more often when they get it wrong, experts would get the benefit of only one-ninth of the original 50 percent reduction in the likelihood ratio (a 5.5 percent reduction).
Figure A.9. The impact of repeatedly presenting weak, strong, and extremely strong evidence (likelihood ratios from 1.5:1 to 4:1 to 19:1) on updating of beliefs in a prior hypothesis initially assigned low, moderate, or high probability (from 0.1 to 0.5 to 0.9). Convergence in beliefs eventually occurs across the fifteen trials, but it occurs more rapidly when belief in prior hypothesis is strong and the evidence is probative.
Of all the adjustments to probability scoring and Bayesian belief-updating indicators, fuzzy-set adjustments are, for obvious reasons, the most controversial.
CONCLUDING THOUGHTS
In the spirit of the old aphorism that you never know you have had enough until you have had more than enough, we hav
e pushed the objectification of good judgment to its approximate point of diminishing returns, and then perhaps beyond. We recognize, of course, that not everyone will make the same epistemic judgment calls: some readers will conclude that we did not go far enough (and made too many concessions to defensive forecasters and their post-modern apologists) and others will conclude that we went far too far (and created a prime exhibit of pretentious “scientism”). There is no point in disguising the fallibility of our analytical apparatus or in denying the obvious: this effort falls far short if our benchmark is methodological perfection. But this effort fares better if we adopt a more realistic standard: Have we offered a starter framework for drawing cumulative lessons about the determinants of the accuracy of expert judgment in complex real-world situations? This project is intended to begin a conversation, not end it.
1 A. H. Murphy and R. L. Winkler, “Probability Forecasts: A Survey of National Weather Service Forecasters,” Bulletin of the American Meteorological Society 55 (1974): 1449–53; Murphy, “Probability Forecasts.”
Expert Political Judgment Page 46