Rationality- From AI to Zombies

Page 128

by Eliezer Yudkowsky

You may recall from my previous writing on “empathic inference” the idea that brains are so complex that the only way to simulate them is by forcing a similar brain to behave similarly. A brain is so complex that if a human tried to understand brains the way that we understand e.g. gravity or a car—observing the whole, observing the parts, building up a theory from scratch—then we would be unable to invent good hypotheses in our mere mortal lifetimes. The only possible way you can hit on an “Aha!” that describes a system as incredibly complex as an Other Mind, is if you happen to run across something amazingly similar to the Other Mind—namely your own brain—which you can actually force to behave similarly and use as a hypothesis, yielding predictions.

So that is what I would call “empathy.”

And then “sympathy” is something else on top of this—to smile when you see someone else smile, to hurt when you see someone else hurt. It goes beyond the realm of prediction into the realm of reinforcement.

And you ask, “Why would callous natural selection do anything that nice?”

It might have gotten started, maybe, with a mother’s love for her children, or a brother’s love for a sibling. You can want them to live, you can want them to be fed, sure; but if you smile when they smile and wince when they wince, that’s a simple urge that leads you to deliver help along a broad avenue, in many walks of life. So long as you’re in the ancestral environment, what your relatives want probably has something to do with your relatives’ reproductive success—this being an explanation for the selection pressure, of course, not a conscious belief.

You may ask, “Why not evolve a more abstract desire to see certain people tagged as ‘relatives’ get what they want, without actually feeling yourself what they feel?” And I would shrug and reply, “Because then there’d have to be a whole definition of ‘wanting’ and so on. Evolution doesn’t take the elaborate correct optimal path, it falls up the fitness landscape like water flowing downhill. The mirroring-architecture was already there, so it was a short step from empathy to sympathy, and it got the job done.”

Relatives—and then reciprocity; your allies in the tribe, those with whom you trade favors. Tit for Tat, or evolution’s elaboration thereof to account for social reputations.

Who is the most formidable, among the human kind? The strongest? The smartest? More often than either of these, I think, it is the one who can call upon the most friends.

So how do you make lots of friends?

You could, perhaps, have a specific urge to bring your allies food, like a vampire bat—they have a whole system of reciprocal blood donations going in those colonies. But it’s a more general motivation, that will lead the organism to store up more favors, if you smile when designated friends smile.

And what kind of organism will avoid making its friends angry at it, in full generality? One that winces when they wince.

Of course you also want to be able to kill designated Enemies without a qualm—these are humans we’re talking about.

But . . . I’m not sure of this, but it does look to me like sympathy, among humans, is “on” by default. There are cultures that help strangers . . . and cultures that eat strangers; the question is which of these requires the explicit imperative, and which is the default behavior for humans. I don’t really think I’m being such a crazy idealistic fool when I say that, based on my admittedly limited knowledge of anthropology, it looks like sympathy is on by default.

Either way . . . it’s painful if you’re a bystander in a war between two sides, and your sympathy has not been switched off for either side, so that you wince when you see a dead child no matter what the caption on the photo; and yet those two sides have no sympathy for each other, and they go on killing.

So that is the human idiom of sympathy—a strange, complex, deep implementation of reciprocity and helping. It tangles minds together—not by a term in the utility function for some other mind’s “desire,” but by the simpler and yet far more consequential path of mirror neurons: feeling what the other mind feels, and seeking similar states. Even if it’s only done by observation and inference, and not by direct transmission of neural information as yet.

Empathy is a human way of predicting other minds. It is not the only possible way.

The human brain is not quickly rewirable; if you’re suddenly put into a dark room, you can’t rewire the visual cortex as auditory cortex, so as to better process sounds, until you leave, and then suddenly shift all the neurons back to being visual cortex again.

An AI, at least one running on anything like a modern programming architecture, can trivially shift computing resources from one thread to another. Put in the dark? Shut down vision and devote all those operations to sound; swap the old program to disk to free up the RAM, then swap the disk back in again when the lights go on.

So why would an AI need to force its own mind into a state similar to what it wanted to predict? Just create a separate mind-instance—maybe with different algorithms, the better to simulate that very dissimilar human. Don’t try to mix up the data with your own mind-state; don’t use mirror neurons. Think of all the risk and mess that implies!

An expected utility maximizer—especially one that does understand intelligence on an abstract level—has other options than empathy, when it comes to understanding other minds. The agent doesn’t need to put itself in anyone else’s shoes; it can just model the other mind directly. A hypothesis like any other hypothesis, just a little bigger. You don’t need to become your shoes to understand your shoes.

And sympathy? Well, suppose we’re dealing with an expected paperclip maximizer, but one that isn’t yet powerful enough to have things all its own way—it has to deal with humans to get its paperclips. So the paperclip agent . . . models those humans as relevant parts of the environment, models their probable reactions to various stimuli, and does things that will make the humans feel favorable toward it in the future.

To a paperclip maximizer, the humans are just machines with pressable buttons. No need to feel what the other feels—if that were even possible across such a tremendous gap of internal architecture. How could an expected paperclip maximizer “feel happy” when it saw a human smile? “Happiness” is an idiom of policy reinforcement learning, not expected utility maximization. A paperclip maximizer doesn’t feel happy when it makes paperclips; it just chooses whichever action leads to the greatest number of expected paperclips. Though a paperclip maximizer might find it convenient to display a smile when it made paperclips—so as to help manipulate any humans that had designated it a friend.

You might find it a bit difficult to imagine such an algorithm—to put yourself into the shoes of something that does not work like you do, and does not work like any mode your brain can make itself operate in.

You can make your brain operate in the mode of hating an enemy, but that’s not right either. The way to imagine how a truly unsympathetic mind sees a human is to imagine yourself as a useful machine with levers on it. Not a human-shaped machine, because we have instincts for that. Just a woodsaw or something. Some levers make the machine output coins; other levers might make it fire a bullet. The machine does have a persistent internal state and you have to pull the levers in the right order. Regardless, it’s just a complicated causal system—nothing inherently mental about it.

(To understand unsympathetic optimization processes, I would suggest studying natural selection, which doesn’t bother to anesthetize fatally wounded and dying creatures, even when their pain no longer serves any reproductive purpose, because the anesthetic would serve no reproductive purpose either.)

That’s why I list “sympathy” in front of even “boredom” on my list of things that would be required to have aliens that are the least bit, if you’ll pardon the phrase, sympathetic. It’s not impossible that sympathy exists among some significant fraction of all evolved alien intelligent species; mirror neurons seem like the sort of thing that, having happened once, could happen again.

Unsympathetic aliens might be trading partners—or not; stars and such resources are pretty much the same the universe over. We might negotiate treaties with them, and they might keep them for calculated fear of reprisal. We might even cooperate in the Prisoner’s Dilemma. But we would never be friends with them. They would never see us as anything but means to an end. They would never shed a tear for us, nor smile for our joys. And the others of their own kind would receive no different consideration, nor have any sense that they were missing something important thereby.

Such aliens would be varelse, not ramen—the sort of aliens we can’t relate to on any personal level, and no point in trying.

*

277

High Challenge

There’s a class of prophecy that runs: “In the Future, machines will do all the work. Everything will be automated. Even labor of the sort we now consider ‘intellectual,’ like engineering, will be done by machines. We can sit back and own the capital. You’ll never have to lift a finger, ever again.”

But then won’t people be bored?

No; they can play computer games—not like our games, of course, but much more advanced and entertaining.

Yet wait! If you buy a modern computer game, you’ll find that it contains some tasks that are—there’s no kind word for this—effortful. (I would even say “difficult,” with the understanding that we’re talking about something that takes ten minutes, not ten years.)

So in the future, we’ll have programs that help you play the game—taking over if you get stuck on the game, or just bored; or so that you can play games that would otherwise be too advanced for you.

But isn’t there some wasted effort, here? Why have one programmer working to make the game harder, and another programmer to working to make the game easier? Why not just make the game easier to start with? Since you play the game to get gold and experience points, making the game easier will let you get more gold per unit time: the game will become more fun.

So this is the ultimate end of the prophecy of technological progress—just staring at a screen that says “YOU WIN,” forever.

And maybe we’ll build a robot that does that, too.

Then what?

The world of machines that do all the work—well, I don’t want to say it’s “analogous to the Christian Heaven” because it isn’t supernatural; it’s something that could in principle be realized. Religious analogies are far too easily tossed around as accusations . . . But, without implying any other similarities, I’ll say that it seems analogous in the sense that eternal laziness “sounds like good news” to your present self who still has to work.

And as for playing games, as a substitute—what is a computer game except synthetic work? Isn’t there a wasted step here? (And computer games in their present form, considered as work, have various aspects that reduce stress and increase engagement; but they also carry costs in the form of artificiality and isolation.)

I sometimes think that futuristic ideals phrased in terms of “getting rid of work” would be better reformulated as “removing low-quality work to make way for high-quality work.”

There’s a broad class of goals that aren’t suitable as the long-term meaning of life, because you can actually achieve them, and then you’re done.

To look at it another way, if we’re looking for a suitable long-run meaning of life, we should look for goals that are good to pursue and not just good to satisfy.

Or to phrase that somewhat less paradoxically: We should look for valuations that are over 4D states, rather than 3D states. Valuable ongoing processes, rather than “make the universe have property P and then you’re done.”

Timothy Ferris is worth quoting: To find happiness, “the question you should be asking isn’t ‘What do I want?’ or ‘What are my goals?’ but ‘What would excite me?’”

You might say that for a long-run meaning of life, we need games that are fun to play and not just to win.

Mind you—sometimes you do want to win. There are legitimate goals where winning is everything. If you’re talking, say, about curing cancer, then the suffering experienced by even a single cancer patient outweighs any fun that you might have in solving their problems. If you work at creating a cancer cure for twenty years through your own efforts, learning new knowledge and new skill, making friends and allies—and then some alien superintelligence offers you a cancer cure on a silver platter for thirty bucks—then you shut up and take it.

But “curing cancer” is a problem of the 3D-predicate sort: you want the no-cancer predicate to go from False in the present to True in the future. The importance of this destination far outweighs the journey; you don’t want to go there, you just want to be there. There are many legitimate goals of this sort, but they are not suitable as long-run fun. “Cure cancer!” is a worthwhile activity for us to pursue here and now, but it is not a plausible future goal of galactic civilizations.

Why should this “valuable ongoing process” be a process of trying to do things—why not a process of passive experiencing, like the Buddhist Heaven?

I confess I’m not entirely sure how to set up a “passively experiencing” mind. The human brain was designed to perform various sorts of internal work that add up to an active intelligence; even if you lie down on your bed and exert no particular effort to think, the thoughts that go on through your mind are activities of brain areas that are designed to, you know, solve problems.

How much of the human brain could you eliminate, apart from the pleasure centers, and still keep the subjective experience of pleasure?

I’m not going to touch that one. I’ll stick with the much simpler answer of “I wouldn’t actually prefer to be a passive experiencer.” If I wanted Nirvana, I might try to figure out how to achieve that impossibility. But once you strip away Buddha telling me that Nirvana is the end-all of existence, Nirvana seems rather more like “sounds like good news in the moment of first being told” or “ideological belief in desire,” rather than, y’know, something I’d actually want.

The reason I have a mind at all is that natural selection built me to do things—to solve certain kinds of problems.

“Because it’s human nature” is not an explicit justification for anything. There is human nature, which is what we are; and there is humane nature, which is what, being human, we wish we were.

But I don’t want to change my nature toward a more passive object—which is a justification. A happy blob is not what, being human, I wish to become.

I earlier argued that many values require both subjective happiness and the external objects of that happiness. That you can legitimately have a utility function that says, “It matters to me whether or not the person I love is a real human being or just a highly realistic nonsentient chatbot, even if I don’t know, because that-which-I-value is not my own state of mind, but the external reality.” So that you need both the experience of love, and the real lover.

You can similarly have valuable activities that require both real challenge and real effort.

Racing along a track, it matters that the other racers are real, and that you have a real chance to win or lose. (We’re not talking about physical determinism here, but whether some external optimization process explicitly chose for you to win the race.)

And it matters that you’re racing with your own skill at running and your own willpower, not just pressing a button that says “Win.” (Though, since you never designed your own leg muscles, you are racing using strength that isn’t yours. A race between robot cars is a purer contest of their designers. There is plenty of room to improve on the human condition.)

And it matters that you, a sentient being, are experiencing it. (Rather than some nonsentient process carrying out a skeleton imitation of the race, trillions of times per second.)

There must be the true effort, the true victory, and the true experience—the journey, the destination and the traveler.

*

278

Serious Stories<
br />
Every Utopia ever constructed—in philosophy, fiction, or religion—has been, to one degree or another, a place where you wouldn’t actually want to live. I am not alone in this important observation: George Orwell said much the same thing in “Why Socialists Don’t Believe In Fun,” and I expect that many others said it earlier.

If you read books on How To Write—and there are a lot of books out there on How To Write, because, amazingly, a lot of book-writers think they know something about writing—these books will tell you that stories must contain “conflict.”

That is, the more lukewarm sort of instructional book will tell you that stories contain “conflict.” But some authors speak more plainly.

“Stories are about people’s pain.” Orson Scott Card.

“Every scene must end in disaster.” Jack Bickham.

In the age of my youthful folly, I took for granted that authors were excused from the search for true Eutopia, because if you constructed a Utopia that wasn’t flawed . . . what stories could you write, set there? “Once upon a time they lived happily ever after.” What use would it be for a science-fiction author to try to depict a positive intelligence explosion, when a positive intelligence explosion would be . . .

. . . the end of all stories?

It seemed like a reasonable framework with which to examine the literary problem of Utopia, but something about that final conclusion produced a quiet, nagging doubt.

At that time I was thinking of an AI as being something like a safe wish-granting genie for the use of individuals. So the conclusion did make a kind of sense. If there was a problem, you would just wish it away, right? Ergo—no stories. So I ignored the quiet, nagging doubt.

Much later, after I concluded that even a safe genie wasn’t such a good idea, it also seemed in retrospect that “no stories” could have been a productive indicator. On this particular occasion, “I can’t think of a single story I’d want to read about this scenario,” might indeed have pointed me toward the reason “I wouldn’t want to actually live in this scenario.”

‹ Prev Next ›