Consider some examples.
Discrimination
Let us stipulate that in many domains, some people discriminate on legally impermissible grounds: race, sex, religion, disability, age. Discrimination might be a product of conscious bias, in the form of a desire to benefit or burden people of specific social groups. Alternatively, it might be a product of unconscious bias, in the form of automatic devaluation of which people are not aware and which they might reject, or be embarrassed by, if they were made aware of it. Whether conscious or unconscious, discrimination raises puzzles for preference reversals. Does joint or separate evaluation matter? If so, is one better? As we shall see, no simple answer makes sense. But there is a fortunate qualification, which is that if the goal is to stop discrimination, we can specify the circumstances in which joint or separate evaluation will be more likely to achieve that goal.
Iris Bohnet and her coauthors have found that in joint evaluation, people decide on the merits, but in separate evaluation, gender matters, so men have an advantage over women.29 To simplify the story: Suppose that people are evaluating two candidates, Jack and Jill. In separate evaluation, Jack has an advantage because he is male; people automatically value men more highly than women, and in separate evaluation that automatic assessment matters. But in joint evaluation, the advantage dissipates. People compare the two candidates, and if Jill’s qualifications are better, then she will be hired. Merit becomes decisive. That is the central finding by Bohnet et al.: discrimination occurs in separate evaluation; merit dominates in joint evaluation.30
The mechanism here seems straightforward. It does not involve evaluability. In the relevant population, there is a bias in favor of men, which means that in separate evaluation they will do better. But people know, in some sense, that they should not discriminate, and in joint evaluation the fact that they are discriminating will stare them in the face. If they see that they are discriminating, they will be embarrassed—and stop. It follows that when women have stronger qualifications than men, women will be chosen. We could easily imagine similar findings in the context of discrimination on the basis of race, religion, age, or disability.
From this finding, however, it would be a mistake to conclude that joint evaluation is generally a safeguard against discrimination. It works that way only under specified conditions; under different conditions, it might actually aggravate discrimination. Suppose, for example, that people have a self-conscious bias of some kind and that they are not at all ashamed of it. In joint evaluation, those who have such a bias might discriminate more, not less, than in separate evaluation. In separate evaluation, sex or race might not loom so large; discriminators might focus mainly on qualifications. But if people are biased against women or African Americans, they might discriminate more in joint evaluation simply because sex or race might crowd out other considerations; it might operate like total harmonic distortion in the case of the CD changers.
Imagine, for example, these cases:
Potential employee A: very strong record, excellent experience
Potential employee B: strong record, good experience, from a prominent family, not Jewish
In separate evaluation, potential employee A might well be preferred. (Let us suppose that potential employee A is not from a prominent family and is Jewish, but neither of those characteristics is salient in separate evaluation.) But if social attitudes take a particular form, potential employee B might well be preferred in joint evaluation. With respect to discrimination, everything depends on the relevant constellation of attitudes.
Punitive Damages
In the US legal system (and many others), juries are allowed to award punitive damages to punish wrongdoing. There is a pervasive question, to which the Supreme Court has occasionally been attentive, whether such awards are arbitrary or excessive.31 In deciding whether to give punitive damages and choosing what amount to award, juries are not permitted to consider comparison cases. They are making decisions in separate evaluation. Does that matter? If so, how? Here are some answers.
1. Normalization
Consider this pair of cases:
Case A: childproof safety cap fails and child needs hospital stay
Case B: repainted cars sold as new to a leasing agency
When people see two cases of this kind in isolation, they tend to receive similar punishment ratings (on a bounded scale of 1 to 8) and similar monetary awards.32 But when they see them jointly, case A receives much higher punishment ratings and much higher punitive awards.33 There is a clear reversal, but the mechanism is different from what we have seen previously. There is no problem of evaluability—at least not in the same sense. Nor is there anything like a characteristic that people might find relevant only or mostly in separate evaluation (such as gender). When people see a case of physical harm in separate evaluation, they spontaneously normalize it by comparing it to other cases of physical harm.34 A failed childproof safety cap is, in the category of physical harms, not good—but it is not all that bad. When people see a case of financial harm in separate evaluation, they engage in a similar act of normalization. Repainted cars are, in that category, not good—but not all that bad. Hence the two receive similar ratings and similar awards.
At the same time, people agree that a case of physical harm is generally worse than one of financial harm. It follows that in joint evaluation, the failed safety caps look significantly worse, in the sense that they deserve more severe punishment. The effect of joint evaluation is to dislodge people from their spontaneous use of category-bound judgments. They will think more broadly. In experiments of this kind, joint evaluation has a significant effect because it enlarges the universe of cases about which participants will think. The mechanism is related to that in the context of sex discrimination, for which joint evaluation forces a different and better form of deliberation.
There is a broad point about outrage here. Outrage is category-bound, in the sense that the level of felt outrage will be a function of category in which the offending behavior falls. If someone cuts ahead in an airport security line or makes a rude comment on social media, people might feel high levels of outrage. But if they compare such behavior to, say, child abuse or assault, they might be a bit embarrassed (and feel far less outrage). Punitive damage judgments are a product of outrage.35 Because outrage is category-bound, preference reversals are essentially inevitable.
2. Manipulation Again
Here again, joint evaluation affords ample opportunity for manipulation in the selection of the comparison cases. Suppose that people are thinking about cases involving financial harm. If they are exposed to cases involving rape, financial harm might seem trivial and receive significantly lower ratings. But if they are exposed to cases involving minor acts of trespass on private property, financial harm might seem serious and receive significantly higher ratings. To see the point, compare these cases:
Case A: Jones, an editor at a national news magazine, sexually harassed a female employee; he tried, on several occasions, to kiss her against her will, and he made her extremely uncomfortable at work
Case B: Smith, who has had several tickets for reckless driving, recently hit a pedestrian at night; the pedestrian suffered five broken bones (including a severe concussion)
It is easily imaginable that in separate evaluation, Jones and Smith would receive equally severe punishment, or even that Jones’ punishment would be more severe—but that in joint evaluation, Smith’s punishment would be more severe. Now imagine a different case B:
Case B: Smith, a high school student in a rock band, has repeatedly played very loud music late at night, keeping his neighbor, Wilson, wide awake
It is easily imaginable that in joint evaluation, case B would produce a higher punishment for Jones than Jones would receive in separate evaluation.
3. Normative Considerations
Notwithstanding these points, it is true that for problems of this kind joint evaluation has an important advantage over separate evaluation:
the latter will produce a pattern of results that people will themselves reject on reflection.36 In other words, separate evaluation will yield what separate evaluators will consider to be an incoherent set of results. As noted, punitive damage awards must be assessed in isolation; the legal system now forbids juries from considering comparison cases, and if jury awards are challenged, judicial consideration of such cases may occur—but in practice, it is severely restricted.
That does seem to be a mistake. If we want coherent patterns, joint evaluation is better. But there are two problems with joint evaluation: The first is that it is not global evaluation. If the goal is to produce reasoned outcomes, and to prevent what decisionmakers would themselves consider unfairness, they should look at a universe of cases, not two, and not a handful. The second problem is that in these kinds of cases, we might not want to celebrate what comes from either mode of evaluation. In the context of punitive damage awards, we cannot offer a normative judgment without some kind of theory about what punitive damages are for.
Suppose that we adopt an economic perspective focused on optimal deterrence and see punitive damage awards as justified to make up for the fact that the probability of detection and compensation is less than 100 percent. If so, the question is whether separate or joint evaluation focuses people on how to achieve optimal deterrence. Unfortunately, neither does so. People are intuitive retributivists.37 They do not naturally think in terms of optimal deterrence, and they will be reluctant to do so even if they are asked (see chapter 14).38 From the standpoint of optimal deterrence, the conclusion about joint and separate evaluation is simple: a pox on both your houses.
If we embrace a retributive theory of punishment, joint evaluation seems better. On one view, the question is how best to capture the moral outrage of the community, and in separate evaluation category-bound thinking ensures that the question will not be answered properly (by the community’s own lights). But one more time: the major problem with joint evaluation is that it is not global evaluation. That point argues in favor of some effort to broaden the jury’s viewscreen to consider cases in light of a range of other cases, to allow reviewing judges to do the same, or to create a damage schedule of some kind.
These conclusions bear on some philosophical questions on which preference reversals have been observed.39 To summarize a complex story,40 most people say that they would not push a person off a bridge and in front of a train, even if that is the only way to divert the train and thus to save five people who are in the train’s path. But most people say that they would be willing to flip a switch that would divert the train away from the five people and kill a bystander. Typically the two problems (usually called the footbridge and the trolley problems) are tested separately, and moral intuitions in the two cases diverge.
But what if people assess the problems jointly? The simple answer is that they endeavor to give the same answer to the two questions—either to say that they would save the five or that they would refuse to kill the one. (Perhaps surprisingly, there is no strong movement in the direction of the utilitarian direction, which counsels in favor of saving the five.) But it is not clear how to evaluate this shift. Nor is it even clear that in joint evaluation, people are showing moral consistency. Whether they are doing so depends on whether, on normative grounds, the footbridge and trolley problems are the same. The answer depends on whether we accept a utilitarian or nonutilitarian account, and whether we should be utilitarians cannot be dictated by how people respond in joint or separate evaluation.
Contingent Valuation
When regulators engage in cost-benefit analysis, they sometimes have to value goods that are not traded on markets or for which market evidence is unavailable or unreliable. In such circumstances, they engage in surveys, sometimes described as involving “stated preference” or “contingent valuation.”41 Let us bracket the serious controversies over the usefulness and reliability of these methods42 and ask a simple question: Do people’s valuations differ depending on whether they see cases separately or jointly?
Consider these cases, in which, on a bounded scale, people were asked about their satisfaction from providing help and their willingness to pay:
Cause A: program to improve detection of skin cancer in farm workers
Cause B: fund to clean up and protect dolphin breeding locations
When people see the two in isolation, they show a higher satisfaction rating from giving to cause B and they are willing to pay about the same.43 But when they evaluate them jointly, they show a much higher satisfaction rating from A and they want to pay far more for it.44 Here, the best explanation involves category-bound thinking.45 Detection of skin cancer among farm workers is important, but in terms of human health it may not be the most pressing priority. Protection of dolphins plucks at the heartstrings. But most people would want to pay more for the former than the latter if they are choosing among them.
For contingent valuation, is joint evaluation better than separate evaluation or vice versa? Is either approach reliable? For those who embrace contingent valuation methods, the goal is to discern how much informed people value various goods, replicating the idealized use of the willingness to pay criterion in well-functioning markets. If that is the goal, separate evaluation faces a serious problem, which is that people make judgments with a narrow viewscreen. On that count, joint evaluation seems better to the extent that it broadens the viewscreen—which means that joint evaluation cannot involve cases from the same category. But even with that proviso, from what category do we find cause B? There is a risk of manipulation here, driving judgments about cause A up or down.
Global evaluation or something like it seems better than joint evaluation, but it is challenging to design anything like it in practice. If the theory of contingent valuation is generally plausible—a big if—then preference reversals give us new reason to emphasize the importance of broad viewscreens for participants.
Evaluating Nudges
Do people approve of nudges? Which ones? In recent years, that question has produced a growing literature.46 A central finding is that in the domains of safety, health, and the environment, people generally approve of nudges of the kind that have been adopted or seriously considered in various democracies in recent years.47 At the same time, majorities generally seem to favor educative nudges, such as mandatory information disclosure, over noneducative nudges, such as default rules.48
But do they really? Shai Davidai and Eldar Shafir have shown that in joint evaluation, they do—but in separate evaluation, they do not.49 Consider this stylized example:
Policy A: promote savings by automatically enrolling employees into pension plans, subject to opt out
Policy B: promote savings by giving employees clear, simple information about the benefits of enrolling into pension plans
In joint evaluation, most people will prefer an educative nudge and also rank it higher on a bounded scale. But in separate evaluation, the rankings are identical or at least similar.50 The best explanation, which should now be familiar, involves salience. In separate evaluation, people do not specifically focus on whether a nudge is or is not educative. In joint evaluation, the distinction along that dimension (educative or not?) becomes highly salient, and it drives people’s judgments.
Is joint or separate evaluation better? Note that in this context, the question is whether we have reason to trust one or another evaluative judgment. At first glance, the answer seems straightforward. Joint evaluation would be better if, on normative grounds, it is appropriate to make a sharp distinction between System 1 and System 2 nudges. If it is indeed appropriate, joint evaluation is best because it places a bright spotlight on that distinction. But if, on normative grounds, that distinction is entitled to little or no weight, its salience in joint evaluation leads people in the wrong direction. One more time: the problem with joint evaluation is that it draws people’s attention very directly to a factor that might deserve little normative weight.
With respect to
evaluation of policies, there is a much broader point here, connected with what we have seen in the context of consumer goods. Policy A might look good or bad in the abstract; it might be difficult to evaluate it unless its features are placed in some kind of context. The opportunity to see policy B can provide helpful information, but it might focus people on what distinguishes policy A from policy B and give it undue prominence. Here again, experimenters or politicians can engage in manipulation on this count. Global evaluation would be better, but, as before, it is challenging to implement. In cases like those presented by Davidai and Shafir, there is no escaping a normative question: Does the distinction made salient in joint evaluation matter or not? The normative question must be engaged on its merits, not by asking what people prefer and how the answer differs depending on joint or separate evaluation. It is tempting, and not wrong, to emphasize that in separate evaluation, people will not pay attention to a factor that might turn out to be critical in joint evaluation. But that is not an answer to the normative question.
What if we are trying to discover what people actually think or what they think on reflection? In this context, it is reasonable to wonder whether there is an answer to that question, at least for purposes of choosing between joint and separate evaluation. Because people’s answers are a product of which kind of evaluation is being asked of them, all that can be said is that to that extent, their preferences are labile.
How Change Happens Page 23