Rationality- From AI to Zombies

Page 130

by Eliezer Yudkowsky

Value isn’t just complicated, it’s fragile. There is more than one dimension of human value, where if just that one thing is lost, the Future becomes null. A single blow and all value shatters. Not every single blow will shatter all value—but more than one possible “single blow” will do so.

And then there are the long defenses of this proposition, which relies on 75% of my Overcoming Bias posts, so that it would be more than one day’s work to summarize all of it. Maybe some other week. There’s so many branches I’ve seen that discussion tree go down.

After all—a mind shouldn’t just go around having the same experience over and over and over again. Surely no superintelligence would be so grossly mistaken about the correct action?

Why would any supermind want something so inherently worthless as the feeling of discovery without any real discoveries? Even if that were its utility function, wouldn’t it just notice that its utility function was wrong, and rewrite it? It’s got free will, right?

Surely, at least boredom has to be a universal value. It evolved in humans because it’s valuable, right? So any mind that doesn’t share our dislike of repetition will fail to thrive in the universe and be eliminated . . .

If you are familiar with the difference between instrumental values and terminal values, and familiar with the stupidity of natural selection, and you understand how this stupidity manifests in the difference between executing adaptations versus maximizing fitness, and you know this turned instrumental subgoals of reproduction into decontextualized unconditional emotions . . .

. . . and you’re familiar with how the tradeoff between exploration and exploitation works in Artificial Intelligence . . .

. . . then you might be able to see that the human form of boredom that demands a steady trickle of novelty for its own sake isn’t a grand universal, but just a particular algorithm that evolution coughed out into us. And you might be able to see how the vast majority of possible expected utility maximizers would only engage in just so much efficient exploration, and spend most of their time exploiting the best alternative found so far, over and over and over.

That’s a lot of background knowledge, though.

And so on and so on and so on through 75% of my posts on Overcoming Bias, and many chains of fallacy and counter-explanation. Some week I may try to write up the whole diagram. But for now I’m going to assume that you’ve read the arguments, and just deliver the conclusion:

We can’t relax our grip on the future—let go of the steering wheel—and still end up with anything of value.

And those who think we can—

—they’re trying to be cosmopolitan. I understand that. I read those same science fiction books as a kid: The provincial villains who enslave aliens for the crime of not looking just like humans. The provincial villains who enslave helpless AIs in durance vile on the assumption that silicon can’t be sentient. And the cosmopolitan heroes who understand that minds don’t have to be just like us to be embraced as valuable—

I read those books. I once believed them. But the beauty that jumps out of one box is not jumping out of all boxes. If you leave behind all order, what is left is not the perfect answer; what is left is perfect noise. Sometimes you have to abandon an old design rule to build a better mousetrap, but that’s not the same as giving up all design rules and collecting wood shavings into a heap, with every pattern of wood as good as any other. The old rule is always abandoned at the behest of some higher rule, some higher criterion of value that governs.

If you loose the grip of human morals and metamorals—the result is not mysterious and alien and beautiful by the standards of human value. It is moral noise, a universe tiled with paperclips. To change away from human morals in the direction of improvement rather than entropy requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone.

Relax the grip of human value upon the universe, and it will end up seriously valueless. Not strange and alien and wonderful, shocking and terrifying and beautiful beyond all human imagination. Just—tiled with paperclips.

It’s only some humans, you see, who have this idea of embracing manifold varieties of mind—of wanting the Future to be something greater than the past—of being not bound to our past selves—of trying to change and move forward.

A paperclip maximizer just chooses whichever action leads to the greatest number of paperclips.

No free lunch. You want a wonderful and mysterious universe? That’s your value. You work to create that value. Let that value exert its force through you who represents it; let it make decisions in you to shape the future. And maybe you shall indeed obtain a wonderful and mysterious universe.

No free lunch. Valuable things appear because a goal system that values them takes action to create them. Paperclips don’t materialize from nowhere for a paperclip maximizer. And a wonderfully alien and mysterious Future will not materialize from nowhere for us humans, if our values that prefer it are physically obliterated—or even disturbed in the wrong dimension. Then there is nothing left in the universe that works to make the universe valuable.

You do have values, even when you’re trying to be “cosmopolitan,” trying to display a properly virtuous appreciation of alien minds. Your values are then faded further into the invisible background—they are less obviously human. Your brain probably won’t even generate an alternative so awful that it would wake you up, make you say “No! Something went wrong!” even at your most cosmopolitan. E.g., “a nonsentient optimizer absorbs all matter in its future light cone and tiles the universe with paperclips.” You’ll just imagine strange alien worlds to appreciate.

Trying to be “cosmopolitan”—to be a citizen of the cosmos—just strips off a surface veneer of goals that seem obviously “human.”

But if you wouldn’t like the Future tiled over with paperclips, and you would prefer a civilization of . . .

. . . sentient beings . . .

. . . with enjoyable experiences . . .

. . . that aren’t the same experience over and over again . . .

. . . and are bound to something besides just being a sequence of internal pleasurable feelings . . .

. . . learning, discovering, freely choosing . . .

. . . well, my posts on Fun Theory go into some of the hidden details on those short English words.

Values that you might praise as cosmopolitan or universal or fundamental or obvious common sense are represented in your brain just as much as those values that you might dismiss as merely human. Those values come of the long history of humanity, and the morally miraculous stupidity of evolution that created us. (And once I finally came to that realization, I felt less ashamed of values that seemed “provincial”—but that’s another matter.)

These values do not emerge in all possible minds. They will not appear from nowhere to rebuke and revoke the utility function of an expected paperclip maximizer.

Touch too hard in the wrong dimension, and the physical representation of those values will shatter—and not come back, for there will be nothing left to want to bring it back.

And the referent of those values—a worthwhile universe—would no longer have any physical reason to come into being.

Let go of the steering wheel, and the Future crashes.

*

280

The Gift We Give to Tomorrow

How, oh how, could the universe, itself unloving and mindless, cough up minds who were capable of love?

“No mystery in that,” you say. “It’s just a matter of natural selection.”

But natural selection is cruel, bloody, and bloody stupid. Even when, on the surface of things, biological organisms aren’t directly fighting each other—aren’t directly tearing at each other with claws—there’s still a deeper competition going on between the genes. Genetic information is created when genes increase their relative frequency in the next generation—what matters for “genetic fitness” is not how many c
hildren you have, but that you have more children than others. It is quite possible for a species to evolve to extinction, if the winning genes are playing negative-sum games.

How, oh how, could such a process create beings capable of love?

“No mystery,” you say. “There is never any mystery-in-the-world. Mystery is a property of questions, not answers. A mother’s children share her genes, so the mother loves her children.”

But sometimes mothers adopt children, and still love them. And mothers love their children for themselves, not for their genes.

“No mystery,” you say. “Individual organisms are adaptation-executers, not fitness-maximizers. Evolutionary psychology is not about deliberately maximizing fitness—through most of human history, we didn’t know genes existed. We don’t calculate our acts’ effect on genetic fitness consciously, or even subconsciously.”

But human beings form friendships even with non-relatives. How can that be?

“No mystery, for hunter-gatherers often play Iterated Prisoner’s Dilemmas, the solution to which is reciprocal altruism. Sometimes the most dangerous human in the tribe is not the strongest, the prettiest, or even the smartest, but the one who has the most allies.”

Yet not all friends are fair-weather friends; we have a concept of true friendship—and some people have sacrificed their life for their friends. Would not such a devotion tend to remove itself from the gene pool?

“You said it yourself: we have concepts of true friendship and of fair-weather friendship. We can tell, or try to tell, the difference between someone who considers us a valuable ally, and someone executing the friendship adaptation. We wouldn’t be true friends with someone who we didn’t think was a true friend to us—and someone with many true friends is far more formidable than someone with many fair-weather allies.”

And Mohandas Gandhi, who really did turn the other cheek? Those who try to serve all humanity, whether or not all humanity serves them in turn?

“That perhaps is a more complicated story. Human beings are not just social animals. We are political animals who argue linguistically about policy in adaptive tribal contexts. Sometimes the formidable human is not the strongest, but the one who can most skillfully argue that their preferred policies match the preferences of others.”

Um . . . that doesn’t explain Gandhi, or am I missing something?

“The point is that we have the ability to argue about ‘What should be done?’ as a proposition—we can make those arguments and respond to those arguments, without which politics could not take place.”

Okay, but Gandhi?

“Believed certain complicated propositions about ‘What should be done?’ and did them.”

That sounds suspiciously like it could explain any possible human behavior.

“If we traced back the chain of causality through all the arguments, it would involve: a moral architecture that had the ability to argue general abstract moral propositions like ‘What should be done to people?’; appeal to hardwired intuitions like fairness, a concept of duty, pain aversion, empathy; something like a preference for simple moral propositions, probably reused from our pre-existing Occam prior; and the end result of all this, plus perhaps memetic selection effects, was ‘You should not hurt people’ in full generality—”

And that gets you Gandhi.

“Unless you think it was magic, it has to fit into the lawful causal development of the universe somehow.”

I certainly won’t postulate magic, under any name.

“Good.”

But come on . . . doesn’t it seem a little . . . amazing . . . that hundreds of millions of years worth of evolution’s death tournament could cough up mothers and fathers, sisters and brothers, husbands and wives, steadfast friends and honorable enemies, true altruists and guardians of causes, police officers and loyal defenders, even artists sacrificing themselves for their art, all practicing so many kinds of love? For so many things other than genes? Doing their part to make their world less ugly, something besides a sea of blood and violence and mindless replication?

“Are you claiming to be surprised by this? If so, question your underlying model, for it has led you to be surprised by the true state of affairs.

Since the beginning,

not one unusual thing

has ever happened.”

But how is it not surprising?

“What would you suggest? That some sort of shadowy figure stood behind the scenes and directed evolution?”

Hell no. But—

“Because if you were suggesting that, I would have to ask how that shadowy figure originally decided that love was a desirable outcome of evolution. I would have to ask where that figure got preferences that included things like love, friendship, loyalty, fairness, honor, romance, and so on. On evolutionary psychology, we can see how that specific outcome came about—how those particular goals rather than others were generated in the first place. You can call it ‘surprising’ all you like. But when you really do understand evolutionary psychology, you can see how parental love and romance and honor, and even true altruism and moral arguments, bear the specific design signature of natural selection in particular adaptive contexts of the hunter-gatherer savanna. So if there was a shadowy figure, it must itself have evolved—and that obviates the whole point of postulating it.”

I’m not postulating a shadowy figure! I’m just asking how human beings ended up so nice.

“Nice! Have you looked at this planet lately? We bear all those other emotions that evolved, too—which would tell you very well that we evolved, should you begin to doubt it. Humans aren’t always nice.”

We’re one hell of a lot nicer than the process that produced us, which lets elephants starve to death when they run out of teeth, which doesn’t anesthetize a gazelle even as it lays dying and is of no further importance to evolution one way or the other. It doesn’t take much to be nicer than evolution. To have the theoretical capacity to make one single gesture of mercy, to feel a single twinge of empathy, is to be nicer than evolution.

How did evolution, which is itself so uncaring, create minds on that qualitatively higher moral level? How did evolution, which is so ugly, end up doing anything so beautiful?

“Beautiful, you say? Bach’s Little Fugue in G Minor may be beautiful, but the sound waves, as they travel through the air, are not stamped with tiny tags to specify their beauty. If you wish to find explicitly encoded a measure of the fugue’s beauty, you will have to look at a human brain—nowhere else in the universe will you find it. Not upon the seas or the mountains will you find such judgments written: they are not minds; they cannot think.”

Perhaps that is so. Yet evolution did in fact give us the ability to admire the beauty of a flower. That still seems to call for some deeper answer.

“Do you not see the circularity in your question? If beauty were like some great light in the sky that shined from outside humans, then your question might make sense—though there would still be the question of how humans came to perceive that light. You evolved with a psychology alien to evolution: Evolution has nothing like the intelligence or the precision required to exactly quine its goal system. In coughing up the first true minds, evolution’s simple fitness criterion shattered into a thousand values. You evolved with a psychology that attaches utility to things which evolution does not care about—human life, human happiness. And then you look back and say, ‘How marvelous!’ You marvel and you wonder at the fact that your values coincide with themselves.”

But then—it is still amazing that this particular circular loop, and not some other loop, came into the world. That we find ourselves praising love and not hate, beauty and not ugliness.

“I don’t think you understand. To you, it seems natural to privilege the beauty and altruism as special, as preferred, because you value them highly. And you don’t see this as an unusual fact about yourself, because many of your friends do likewise. So you expect that a ghost of perfect emptiness would also value
life and happiness—and then, from this standpoint outside reality, a great coincidence would indeed have occurred.”

But you can make arguments for the importance of beauty and altruism from first principles—that our aesthetic senses lead us to create new complexity, instead of repeating the same things over and over; and that altruism is important because it takes us outside ourselves, gives our life a higher meaning than sheer brute selfishness.

“And that argument is going to move even a ghost of perfect emptiness? Because you’ve appealed to slightly different values? Those aren’t first principles. They’re just different principles. Speak in a grave and philosophical register, and still you shall find no universally compelling arguments. All you’ve done is pass the recursive buck.”

You don’t think that, somehow, we evolved to tap into something beyond—

“What good does it do to suppose something beyond? Why should we pay more attention to this beyond thing than we pay to our existence as humans? How does it alter your personal responsibility to say that you were only following the orders of the beyond thing? And you would still have evolved to let the beyond thing, rather than something else, direct your actions. It would be too much coincidence.”

Too much coincidence?

“A flower is beautiful, you say. Do you think there is no story behind that beauty, or that science does not know the story? Flower pollen is transmitted by bees, so by sexual selection, flowers evolved to attract bees—by imitating certain mating signs of bees, as it happened; the flowers’ patterns would look more intricate if you could see in the ultraviolet. Now healthy flowers are a sign of fertile land, likely to bear fruits and other treasures, and probably prey animals as well; so is it any wonder that humans evolved to be attracted to flowers? But for there to be some great light written upon the very stars—those huge unsentient balls of burning hydrogen—which also said that flowers were beautiful, now that would be far too much coincidence.”

‹ Prev Next ›