Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 133
Rationality- From AI to Zombies Page 133

by Eliezer Yudkowsky


  “Intuition,” as a term of art, is not a curse word when it comes to morality—there is nothing else to argue from. Even modus ponens is an “intuition” in this sense—it’s just that modus ponens still seems like a good idea after being formalized, reflected on, extrapolated out to see if it has sensible consequences, et cetera.

  So that is “intuition.”

  However, Gowder did not say what he meant by “utilitarianism.” Does utilitarianism say . . .

  That right actions are strictly determined by good consequences?

  That praiseworthy actions depend on justifiable expectations of good consequences?

  That probabilities of consequences should normatively be discounted by their probability, so that a 50% probability of something bad should weigh exactly half as much in our tradeoffs?

  That virtuous actions always correspond to maximizing expected utility under some utility function?

  That two harmful events are worse than one?

  That two independent occurrences of a harm (not to the same person, not interacting with each other) are exactly twice as bad as one?

  That for any two harms A and B, with A much worse than B, there exists some tiny probability such that gambling on this probability of A is preferable to a certainty of B?

  If you say that I advocate something, or that my argument depends on something, and that it is wrong, do please specify what this thingy is. Anyway, I accept 3, 5, 6, and 7, but not 4; I am not sure about the phrasing of 1; and 2 is true, I guess, but phrased in a rather solipsistic and selfish fashion: you should not worry about being praiseworthy.

  Now, what are the “intuitions” upon which my “utilitarianism” depends?

  This is a deepish sort of topic, but I’ll take a quick stab at it.

  First of all, it’s not just that someone presented me with a list of statements like those above, and I decided which ones sounded “intuitive.” Among other things, if you try to violate “utilitarianism,” you run into paradoxes, contradictions, circular preferences, and other things that aren’t symptoms of moral wrongness so much as moral incoherence.

  After you think about moral problems for a while, and also find new truths about the world, and even discover disturbing facts about how you yourself work, you often end up with different moral opinions than when you started out. This does not quite define moral progress, but it is how we experience moral progress.

  As part of my experienced moral progress, I’ve drawn a conceptual separation between questions of type Where should we go? and questions of type How should we get there? (Could that be what Gowder means by saying I’m “utilitarian”?)

  The question of where a road goes—where it leads—you can answer by traveling the road and finding out. If you have a false belief about where the road leads, this falsity can be destroyed by the truth in a very direct and straightforward manner.

  When it comes to wanting to go to a particular place, this want is not entirely immune from the destructive powers of truth. You could go there and find that you regret it afterward (which does not define moral error, but is how we experience moral error).

  But, even so, wanting to be in a particular place seems worth distinguishing from wanting to take a particular road to a particular place.

  Our intuitions about where to go are arguable enough, but our intuitions about how to get there are frankly messed up. After the two hundred and eighty-seventh research study showing that people will chop their own feet off if you frame the problem the wrong way, you start to distrust first impressions.

  When you’ve read enough research on scope insensitivity—people will pay only 28% more to protect all 57 wilderness areas in Ontario than one area, people will pay the same amount to save 50,000 lives as 5,000 lives . . . that sort of thing . . .

  Well, the worst case of scope insensitivity I’ve ever heard of was described here by Slovic:

  Other recent research shows similar results. Two Israeli psychologists asked people to contribute to a costly life-saving treatment. They could offer that contribution to a group of eight sick children, or to an individual child selected from the group. The target amount needed to save the child (or children) was the same in both cases. Contributions to individual group members far outweighed the contributions to the entire group.1

  There’s other research along similar lines, but I’m just presenting one example, ’cause, y’know, eight examples would probably have less impact.

  If you know the general experimental paradigm, then the reason for the above behavior is pretty obvious—focusing your attention on a single child creates more emotional arousal than trying to distribute attention around eight children simultaneously. So people are willing to pay more to help one child than to help eight.

  Now, you could look at this intuition, and think it was revealing some kind of incredibly deep moral truth which shows that one child’s good fortune is somehow devalued by the other children’s good fortune.

  But what about the billions of other children in the world? Why isn’t it a bad idea to help this one child, when that causes the value of all the other children to go down? How can it be significantly better to have 1,329,342,410 happy children than 1,329,342,409, but then somewhat worse to have seven more at 1,329,342,417?

  Or you could look at that and say: “The intuition is wrong: the brain can’t successfully multiply by eight and get a larger quantity than it started with. But it ought to, normatively speaking.”

  And once you realize that the brain can’t multiply by eight, then the other cases of scope neglect stop seeming to reveal some fundamental truth about 50,000 lives being worth just the same effort as 5,000 lives, or whatever. You don’t get the impression you’re looking at the revelation of a deep moral truth about nonagglomerative utilities. It’s just that the brain doesn’t goddamn multiply. Quantities get thrown out the window.

  If you have $100 to spend, and you spend $20 each on each of 5 efforts to save 5,000 lives, you will do worse than if you spend $100 on a single effort to save 50,000 lives. Likewise if such choices are made by 10 different people, rather than the same person. As soon as you start believing that it is better to save 50,000 lives than 25,000 lives, that simple preference of final destinations has implications for the choice of paths, when you consider five different events that save 5,000 lives.

  (It is a general principle that Bayesians see no difference between the long-run answer and the short-run answer; you never get two different answers from computing the same question two different ways. But the long run is a helpful intuition pump, so I am talking about it anyway.)

  The aggregative valuation strategy of “shut up and multiply” arises from the simple preference to have more of something—to save as many lives as possible—when you have to describe general principles for choosing more than once, acting more than once, planning at more than one time.

  Aggregation also arises from claiming that the local choice to save one life doesn’t depend on how many lives already exist, far away on the other side of the planet, or far away on the other side of the universe. Three lives are one and one and one. No matter how many billions are doing better, or doing worse. 3 = 1 + 1 + 1, no matter what other quantities you add to both sides of the equation. And if you add another life you get 4 = 1 + 1 + 1 + 1. That’s aggregation.

  When you’ve read enough heuristics and biases research, and enough coherence and uniqueness proofs for Bayesian probabilities and expected utility, and you’ve seen the “Dutch book” and “money pump” effects that penalize trying to handle uncertain outcomes any other way, then you don’t see the preference reversals in the Allais Paradox as revealing some incredibly deep moral truth about the intrinsic value of certainty. It just goes to show that the brain doesn’t goddamn multiply.

  The primitive, perceptual intuitions that make a choice “feel good” don’t handle probabilistic pathways through time very skillfully, especially when the probabilities have been expressed symbolically rather t
han experienced as a frequency. So you reflect, devise more trustworthy logics, and think it through in words.

  When you see people insisting that no amount of money whatsoever is worth a single human life, and then driving an extra mile to save $10; or when you see people insisting that no amount of money is worth a decrement of health, and then choosing the cheapest health insurance available; then you don’t think that their protestations reveal some deep truth about incommensurable utilities.

  Part of it, clearly, is that primitive intuitions don’t successfully diminish the emotional impact of symbols standing for small quantities—anything you talk about seems like “an amount worth considering.”

  And part of it has to do with preferring unconditional social rules to conditional social rules. Conditional rules seem weaker, seem more subject to manipulation. If there’s any loophole that lets the government legally commit torture, then the government will drive a truck through that loophole.

  So it seems like there should be an unconditional social injunction against preferring money to life, and no “but” following it. Not even “but a thousand dollars isn’t worth a 0.0000000001% probability of saving a life.” Though the latter choice, of course, is revealed every time we sneeze without calling a doctor.

  The rhetoric of sacredness gets bonus points for seeming to express an unlimited commitment, an unconditional refusal that signals trustworthiness and refusal to compromise. So you conclude that moral rhetoric espouses qualitative distinctions, because espousing a quantitative tradeoff would sound like you were plotting to defect.

  On such occasions, people vigorously want to throw quantities out the window, and they get upset if you try to bring quantities back in, because quantities sound like conditions that would weaken the rule.

  But you don’t conclude that there are actually two tiers of utility with lexical ordering. You don’t conclude that there is actually an infinitely sharp moral gradient, some atom that moves a Planck distance (in our continuous physical universe) and sends a utility from zero to infinity. You don’t conclude that utilities must be expressed using hyper-real numbers. Because the lower tier would simply vanish in any equation. It would never be worth the tiniest effort to recalculate for it. All decisions would be determined by the upper tier, and all thought spent thinking about the upper tier only, if the upper tier genuinely had lexical priority.

  As Peter Norvig once pointed out, if Asimov’s robots had strict priority for the First Law of Robotics (“A robot shall not harm a human being, nor through inaction allow a human being to come to harm”) then no robot’s behavior would ever show any sign of the other two Laws; there would always be some tiny First Law factor that would be sufficient to determine the decision.

  Whatever value is worth thinking about at all must be worth trading off against all other values worth thinking about, because thought itself is a limited resource that must be traded off. When you reveal a value, you reveal a utility.

  I don’t say that morality should always be simple. I’ve already said that the meaning of music is more than happiness alone, more than just a pleasure center lighting up. I would rather see music composed by people than by nonsentient machine learning algorithms, so that someone should have the joy of composition; I care about the journey, as well as the destination. And I am ready to hear if you tell me that the value of music is deeper, and involves more complications, than I realize—that the valuation of this one event is more complex than I know.

  But that’s for one event. When it comes to multiplying by quantities and probabilities, complication is to be avoided—at least if you care more about the destination than the journey. When you’ve reflected on enough intuitions, and corrected enough absurdities, you start to see a common denominator, a meta-principle at work, which one might phrase as “Shut up and multiply.”

  Where music is concerned, I care about the journey.

  When lives are at stake, I shut up and multiply.

  It is more important that lives be saved, than that we conform to any particular ritual in saving them. And the optimal path to that destination is governed by laws that are simple, because they are math.

  And that’s why I’m a utilitarian—at least when I am doing something that is overwhelmingly more important than my own feelings about it—which is most of the time, because there are not many utilitarians, and many things left undone.

  *

  1. Paul Slovic, “Numbed by Numbers,” Foreign Policy (March 2007), http://foreignpolicy.com/2007/03/13/numbed-by-numbers/.

  287

  Ends Don’t Justify Means (Among Humans)

  If the ends don’t justify the means, what does?

  —variously attributed

  I think of myself as running on hostile hardware.

  —Justin Corwin

  Humans may have evolved a structure of political revolution, beginning by believing themselves morally superior to the corrupt current power structure, but ending by being corrupted by power themselves—not by any plan in their own minds, but by the echo of ancestors who did the same and thereby reproduced.

  This fits the template:

  In some cases, human beings have evolved in such fashion as to think that they are doing X for prosocial reason Y, but when human beings actually do X, other adaptations execute to promote self-benefiting consequence Z.

  From this proposition, I now move on to a question considerably outside the realm of classical Bayesian decision theory:

  What if I’m running on corrupted hardware?

  In such a case as this, you might even find yourself uttering such seemingly paradoxical statements—sheer nonsense from the perspective of classical decision theory—as:

  The ends don’t justify the means.

  But if you are running on corrupted hardware, then the reflective observation that it seems like a righteous and altruistic act to seize power for yourself—this seeming may not be much evidence for the proposition that seizing power is in fact the action that will most benefit the tribe.

  By the power of naive realism, the corrupted hardware that you run on, and the corrupted seemings that it computes, will seem like the fabric of the very world itself—simply the way-things-are.

  And so we have the bizarre-seeming rule: “For the good of the tribe, do not cheat to seize power even when it would provide a net benefit to the tribe.”

  Indeed it may be wiser to phrase it this way. If you just say, “when it seems like it would provide a net benefit to the tribe,” then you get people who say, “But it doesn’t just seem that way—it would provide a net benefit to the tribe if I were in charge.”

  The notion of untrusted hardware seems like something wholly outside the realm of classical decision theory. (What it does to reflective decision theory I can’t yet say, but that would seem to be the appropriate level to handle it.)

  But on a human level, the patch seems straightforward. Once you know about the warp, you create rules that describe the warped behavior and outlaw it. A rule that says, “For the good of the tribe, do not cheat to seize power even for the good of the tribe.” Or “For the good of the tribe, do not murder even for the good of the tribe.”

  And now the philosopher comes and presents their “thought experiment”—setting up a scenario in which, by stipulation, the only possible way to save five innocent lives is to murder one innocent person, and this murder is certain to save the five lives. “There’s a train heading to run over five innocent people, who you can’t possibly warn to jump out of the way, but you can push one innocent person into the path of the train, which will stop the train. These are your only options; what do you do?”

  An altruistic human, who has accepted certain deontological prohibitions—which seem well justified by some historical statistics on the results of reasoning in certain ways on untrustworthy hardware—may experience some mental distress, on encountering this thought experiment.

  So here’s a reply to that philosopher’s scenario, which I hav
e yet to hear any philosopher’s victim give:

  “You stipulate that the only possible way to save five innocent lives is to murder one innocent person, and this murder will definitely save the five lives, and that these facts are known to me with effective certainty. But since I am running on corrupted hardware, I can’t occupy the epistemic state you want me to imagine. Therefore I reply that, in a society of Artificial Intelligences worthy of personhood and lacking any inbuilt tendency to be corrupted by power, it would be right for the AI to murder the one innocent person to save five, and moreover all its peers would agree. However, I refuse to extend this reply to myself, because the epistemic state you ask me to imagine can only exist among other kinds of people than human beings.”

  Now, to me this seems like a dodge. I think the universe is sufficiently unkind that we can justly be forced to consider situations of this sort. The sort of person who goes around proposing that sort of thought experiment might well deserve that sort of answer. But any human legal system does embody some answer to the question “How many innocent people can we put in jail to get the guilty ones?,” even if the number isn’t written down.

  As a human, I try to abide by the deontological prohibitions that humans have made to live in peace with one another. But I don’t think that our deontological prohibitions are literally inherently nonconsequentially terminally right. I endorse “the end doesn’t justify the means” as a principle to guide humans running on corrupted hardware, but I wouldn’t endorse it as a principle for a society of AIs that make well-calibrated estimates. (If you have one AI in a society of humans, that does bring in other considerations, like whether the humans learn from your example.)

 

‹ Prev