Rationality- From AI to Zombies

Page 59

by Eliezer Yudkowsky

*

150

The Hidden Complexity of Wishes

I wish to live in the locations of my choice, in a physically healthy, uninjured, and apparently normal version of my current body containing my current mental state, a body which will heal from all injuries at a rate three sigmas faster than the average given the medical technology available to me, and which will be protected from any diseases, injuries or illnesses causing disability, pain, or degraded functionality or any sense, organ, or bodily function for more than ten days consecutively or fifteen days in any year . . .

—The Open-Source Wish Project, Wish For Immortality 1.1

There are three kinds of genies: Genies to whom you can safely say, “I wish for you to do what I should wish for”; genies for which no wish is safe; and genies that aren’t very powerful or intelligent.

Suppose your aged mother is trapped in a burning building, and it so happens that you’re in a wheelchair; you can’t rush in yourself. You could cry, “Get my mother out of that building!” but there would be no one to hear.

Luckily you have, in your pocket, an Outcome Pump. This handy device squeezes the flow of time, pouring probability into some outcomes, draining it from others.

The Outcome Pump is not sentient. It contains a tiny time machine, which resets time unless a specified outcome occurs. For example, if you hooked up the Outcome Pump’s sensors to a coin, and specified that the time machine should keep resetting until it sees the coin come up heads, and then you actually flipped the coin, you would see the coin come up heads. (The physicists say that any future in which a “reset” occurs is inconsistent, and therefore never happens in the first place—so you aren’t actually killing any versions of yourself.)

Whatever proposition you can manage to input into the Outcome Pump somehow happens, though not in a way that violates the laws of physics. If you try to input a proposition that’s too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs.

You can also redirect probability flow in more quantitative ways, using the “future function” to scale the temporal reset probability for different outcomes. If the temporal reset probability is 99% when the coin comes up heads, and 1% when the coin comes up tails, the odds will go from 1:1 to 99:1 in favor of tails. If you had a mysterious machine that spit out money, and you wanted to maximize the amount of money spit out, you would use reset probabilities that diminished as the amount of money increased. For example, spitting out $10 might have a 99.999999% reset probability, and spitting out $100 might have a 99.99999% reset probability. This way you can get an outcome that tends to be as high as possible in the future function, even when you don’t know the best attainable maximum.

So you desperately yank the Outcome Pump from your pocket—your mother is still trapped in the burning building, remember?—and try to describe your goal: get your mother out of the building!

The user interface doesn’t take English inputs. The Outcome Pump isn’t sentient, remember? But it does have 3D scanners for the near vicinity, and built-in utilities for pattern matching. So you hold up a photo of your mother’s head and shoulders; match on the photo; use object contiguity to select your mother’s whole body (not just her head and shoulders); and define the future function using your mother’s distance from the building’s center. The further she gets from the building’s center, the less the time machine’s reset probability.

You cry “Get my mother out of the building!,” for luck, and press Enter.

For a moment it seems like nothing happens. You look around, waiting for the fire truck to pull up, and rescuers to arrive—or even just a strong, fast runner to haul your mother out of the building—

BOOM! With a thundering roar, the gas main under the building explodes. As the structure comes apart, in what seems like slow motion, you glimpse your mother’s shattered body being hurled high into the air, traveling fast, rapidly increasing its distance from the former center of the building.

On the side of the Outcome Pump is an Emergency Regret Button. All future functions are automatically defined with a huge negative value for the Regret Button being pressed—a temporal reset probability of nearly 1—so that the Outcome Pump is extremely unlikely to do anything which upsets the user enough to make them press the Regret Button. You can’t ever remember pressing it. But you’ve barely started to reach for the Regret Button (and what good will it do now?) when a flaming wooden beam drops out of the sky and smashes you flat.

Which wasn’t really what you wanted, but scores very high in the defined future function . . .

The Outcome Pump is a genie of the second class. No wish is safe.

If someone asked you to get their poor aged mother out of a burning building, you might help, or you might pretend not to hear. But it wouldn’t even occur to you to explode the building. “Get my mother out of the building” sounds like a much safer wish than it really is, because you don’t even consider the plans that you assign extreme negative values.

Consider again the Tragedy of Group Selectionism: Some early biologists asserted that group selection for low subpopulation sizes would produce individual restraint in breeding; and yet actually enforcing group selection in the laboratory produced cannibalism, especially of immature females. It’s obvious in hindsight that, given strong selection for small subpopulation sizes, cannibals will outreproduce individuals who voluntarily forego reproductive opportunities. But eating little girls is such an un-aesthetic solution that Wynne-Edwards, Allee, Brereton, and the other group-selectionists simply didn’t think of it. They only saw the solutions they would have used themselves.

Suppose you try to patch the future function by specifying that the Outcome Pump should not explode the building: outcomes in which the building materials are distributed over too much volume will have ~1 temporal reset probabilities.

So your mother falls out of a second-story window and breaks her neck. The Outcome Pump took a different path through time that still ended up with your mother outside the building, and it still wasn’t what you wanted, and it still wasn’t a solution that would occur to a human rescuer.

If only the Open-Source Wish Project had developed a Wish To Get Your Mother Out Of A Burning Building:

I wish to move my mother (defined as the woman who shares half my genes and gave birth to me) to outside the boundaries of the building currently closest to me which is on fire; but not by exploding the building; nor by causing the walls to crumble so that the building no longer has boundaries; nor by waiting until after the building finishes burning down for a rescue worker to take out the body . . .

All these special cases, the seemingly unlimited number of required patches, should remind you of the parable of Artificial Addition—programming an Arithmetic Expert Systems by explicitly adding ever more assertions like “fifteen plus fifteen equals thirty, but fifteen plus sixteen equals thirty-one instead.”

How do you exclude the outcome where the building explodes and flings your mother into the sky? You look ahead, and you foresee that your mother would end up dead, and you don’t want that consequence, so you try to forbid the event leading up to it.

Your brain isn’t hardwired with a specific, prerecorded statement that “Blowing up a burning building containing my mother is a bad idea.” And yet you’re trying to prerecord that exact specific statement in the Outcome Pump’s future function. So the wish is exploding, turning into a giant lookup table that records your judgment of every possible path through time.

You failed to ask for what you really wanted. You wanted your mother to go on living, but you wished for her to become more distant from the center of the building.

Except that’s not all you wanted. If your mother was rescued from the building but was horribly burned, that outcome would rank lower in your preference ordering than an outcome where she was rescued safe and sound. So you not only value your mother’s life, but also her health.

And you v
alue not just her bodily health, but her state of mind. Being rescued in a fashion that traumatizes her—for example, a giant purple monster roaring up out of nowhere and seizing her—is inferior to a fireman showing up and escorting her out through a non-burning route. (Yes, we’re supposed to stick with physics, but maybe a powerful enough Outcome Pump has aliens coincidentally showing up in the neighborhood at exactly that moment.) You would certainly prefer her being rescued by the monster to her being roasted alive, however.

How about a wormhole spontaneously opening and swallowing her to a desert island? Better than her being dead; but worse than her being alive, well, healthy, untraumatized, and in continual contact with you and the other members of her social network.

Would it be okay to save your mother’s life at the cost of the family dog’s life, if it ran to alert a fireman but then got run over by a car? Clearly yes, but it would be better ceteris paribus to avoid killing the dog. You wouldn’t want to swap a human life for hers, but what about the life of a convicted murderer? Does it matter if the murderer dies trying to save her, from the goodness of his heart? How about two murderers? If the cost of your mother’s life was the destruction of every extant copy, including the memories, of Bach’s Little Fugue in G Minor, would that be worth it? How about if she had a terminal illness and would die anyway in eighteen months?

If your mother’s foot is crushed by a burning beam, is it worthwhile to extract the rest of her? What if her head is crushed, leaving her body? What if her body is crushed, leaving only her head? What if there’s a cryonics team waiting outside, ready to suspend the head? Is a frozen head a person? Is Terry Schiavo a person? How much is a chimpanzee worth?

Your brain is not infinitely complicated; there is only a finite Kolmogorov complexity / message length which suffices to describe all the judgments you would make. But just because this complexity is finite does not make it small. We value many things, and no they are not reducible to valuing happiness or valuing reproductive fitness.

There is no safe wish smaller than an entire human morality. There are too many possible paths through Time. You can’t visualize all the roads that lead to the destination you give the genie. “Maximizing the distance between your mother and the center of the building” can be done even more effectively by detonating a nuclear weapon. Or, at higher levels of genie power, flinging her body out of the Solar System. Or, at higher levels of genie intelligence, doing something that neither you nor I would think of, just like a chimpanzee wouldn’t think of detonating a nuclear weapon. You can’t visualize all the paths through time, any more than you can program a chess-playing machine by hardcoding a move for every possible board position.

And real life is far more complicated than chess. You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes. Especially if you wish for something longer-term or wider-range than rescuing your mother from a burning building.

I fear the Open-Source Wish Project is futile, except as an illustration of how not to think about genie problems. The only safe genie is a genie that shares all your judgment criteria, and at that point, you can just say “I wish for you to do what I should wish for.” Which simply runs the genie’s should function.

Indeed, it shouldn’t be necessary to say anything. To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish. Otherwise the genie may not choose a path through time that leads to the destination you had in mind, or it may fail to exclude horrible side effects that would lead you to not even consider a plan in the first place. Wishes are leaky generalizations, derived from the huge but finite structure that is your entire morality; only by including this entire structure can you plug all the leaks.

With a safe genie, wishing is superfluous. Just run the genie.

*

151

Anthropomorphic Optimism

The core fallacy of anthropomorphism is expecting something to be predicted by the black box of your brain, when its causal structure is so different from that of a human brain as to give you no license to expect any such thing.

The early (pre-1966) biologists in The Tragedy of Group Selectionism believed that predators would voluntarily restrain their breeding to avoid overpopulating their habitat and exhausting the prey population. Later on, when Michael J. Wade actually went out and created in the laboratory the nigh-impossible conditions for group selection, the adults adapted to cannibalize eggs and larvae, especially female larvae.1

Now, why might the group selectionists have not thought of that possibility?

Suppose you were a member of a tribe, and you knew that, in the near future, your tribe would be subjected to a resource squeeze. You might propose, as a solution, that no couple have more than one child—after the first child, the couple goes on birth control. Saying, “Let’s all individually have as many children as we can, but then hunt down and cannibalize each other’s children, especially the girls,” would not even occur to you as a possibility.

Think of a preference ordering over solutions, relative to your goals. You want a solution as high in this preference ordering as possible. How do you find one? With a brain, of course! Think of your brain as a high-ranking-solution-generator—a search process that produces solutions that rank high in your innate preference ordering.

The solution space on all real-world problems is generally fairly large, which is why you need an efficient brain that doesn’t even bother to formulate the vast majority of low-ranking solutions.

If your tribe is faced with a resource squeeze, you could try hopping everywhere on one leg, or chewing off your own toes. These “solutions” obviously wouldn’t work and would incur large costs, as you can see upon examination—but in fact your brain is too efficient to waste time considering such poor solutions; it doesn’t generate them in the first place. Your brain, in its search for high-ranking solutions, flies directly to parts of the solution space like “Everyone in the tribe gets together, and agrees to have no more than one child per couple until the resource squeeze is past.”

Such a low-ranking solution as “Everyone have as many kids as possible, then cannibalize the girls” would not be generated in your search process.

But the ranking of an option as “low” or “high” is not an inherent property of the option. It is a property of the optimization process that does the preferring. And different optimization processes will search in different orders.

So far as evolution is concerned, individuals reproducing to the fullest and then cannibalizing others’ daughters is a no-brainer; whereas individuals voluntarily restraining their own breeding for the good of the group is absolutely ludicrous. Or to say it less anthropomorphically, the first set of alleles would rapidly replace the second in a population. (And natural selection has no obvious search order here—these two alternatives seem around equally simple as mutations.)

Suppose that one of the biologists had said, “If a predator population has only finite resources, evolution will craft them to voluntarily restrain their breeding—that’s how I’d do it if I were in charge of building predators.” This would be anthropomorphism outright, the lines of reasoning naked and exposed: I would do it this way, therefore I infer that evolution will do it this way.

One does occasionally encounter the fallacy outright, in my line of work. But suppose you say to the one, “An AI will not necessarily work like you do.” Suppose you say to this hypothetical biologist, “Evolution doesn’t work like you do.” What will the one say in response? I can tell you a reply you will not hear: “Oh my! I didn’t realize that! One of the steps of my inference was invalid; I will throw away the conclusion and start over from scratch.”

No: what you’ll hear instead is a reason why any AI has to reason the same way as the speaker. Or a reason why natural selection, following entirely different criteria of optimization and using entirely different methods of optimization, ought to do the same thing that would oc
cur to a human as a good idea.

Hence the elaborate idea that group selection would favor predator groups where the individuals voluntarily forsook reproductive opportunities.

The group selectionists went just as far astray, in their predictions, as someone committing the fallacy outright. Their final conclusions were the same as if they were assuming outright that evolution necessarily thought like themselves. But they erased what had been written above the bottom line of their argument, without erasing the actual bottom line, and wrote in new rationalizations. Now the fallacious reasoning is disguised; the obviously flawed step in the inference has been hidden—even though the conclusion remains exactly the same; and hence, in the real world, exactly as wrong.

But why would any scientist do this? In the end, the data came out against the group selectionists and they were embarrassed.

As I remarked in Fake Optimization Criteria, we humans seem to have evolved an instinct for arguing that our preferred policy arises from practically any criterion of optimization. Politics was a feature of the ancestral environment; we are descended from those who argued most persuasively that the tribe’s interest—not just their own interest—required that their hated rival Uglak be executed. We certainly aren’t descended from Uglak, who failed to argue that his tribe’s moral code—not just his own obvious self-interest—required his survival.

And because we can more persuasively argue for what we honestly believe, we have evolved an instinct to honestly believe that other people’s goals, and our tribe’s moral code, truly do imply that they should do things our way for their benefit.

‹ Prev Next ›