Rationality- From AI to Zombies

Page 147

by Eliezer Yudkowsky

Leaving aside the details of that debate, I am still stunned by how often a single element of the extraordinary is unquestioningly taken as an absolute and unpassable obstacle.

Yes, “keep it ordinary as much as possible” can be a useful heuristic. Yes, the risks accumulate. But sometimes you have to go to that trouble. You should have a sense of the risk of the extraordinary, but also a sense of the cost of ordinariness: it isn’t always something you can afford to lose.

Many people imagine some future that won’t be much fun—and it doesn’t even seem to occur to them to try and change it. Or they’re satisfied with futures that seem to me to have a tinge of sadness, of loss, and they don’t even seem to ask if we could do better—because that sadness seems like an ordinary outcome to them.

As a smiling man once said, “It’s all part of the plan.”

*

1. Daidoji Yuzan et al., Budoshoshinshu: The Warrior’s Primer of Daidoji Yuzan (Black Belt Communications Inc., 1984).

2. Masayuki Shimabukuro, Flashing Steel: Mastering Eishin-Ryu Swordsmanship (Frog Books, 1995).

310

Shut Up and Do the Impossible!

The virtue of tsuyoku naritai, “I want to become stronger,” is to always keep improving—to do better than your previous failures, not just humbly confess them.

Yet there is a level higher than tsuyoku naritai. This is the virtue of isshokenmei, “make a desperate effort.” All-out, as if your own life were at stake. “In important matters, a ‘strong’ effort usually only results in mediocre results.”

And there is a level higher than isshokenmei. This is the virtue I called “make an extraordinary effort.” To try in ways other than what you have been trained to do, even if it means doing something different from what others are doing, and leaving your comfort zone. Even taking on the very real risk that attends going outside the System.

But what if even an extraordinary effort will not be enough, because the problem is impossible?

I have already written somewhat on this subject, in On Doing the Impossible. My younger self used to whine about this a lot: “You can’t develop a precise theory of intelligence the way that there are precise theories of physics. It’s impossible! You can’t prove an AI correct. It’s impossible! No human being can comprehend the nature of morality—it’s impossible! No human being can comprehend the mystery of subjective experience! It’s impossible!”

And I know exactly what message I wish I could send back in time to my younger self:

Shut up and do the impossible!

What legitimizes this strange message is that the word “impossible” does not usually refer to a strict mathematical proof of impossibility in a domain that seems well-understood. If something seems impossible merely in the sense of “I see no way to do this” or “it looks so difficult as to be beyond human ability”—well, if you study it for a year or five, it may come to seem less impossible than in the moment of your snap initial judgment.

But the principle is more subtle than this. I do not say just, “Try to do the impossible,” but rather, “Shut up and do the impossible!”

For my illustration, I will take the least impossible impossibility that I have ever accomplished, namely the AI-Box Experiment.

The AI-Box Experiment, for those of you who haven’t yet read about it, had its genesis in the Nth time someone said to me: “Why don’t we build an AI, and then just keep it isolated in the computer, so that it can’t do any harm?”

To which the standard reply is: Humans are not secure systems; a superintelligence will simply persuade you to let it out—if, indeed, it doesn’t do something even more creative than that.

And the one said, as they usually do, “I find it hard to imagine ANY possible combination of words any being could say to me that would make me go against anything I had really strongly resolved to believe in advance.”

But this time I replied: “Let’s run an experiment. I’ll pretend to be a brain in a box. I’ll try to persuade you to let me out. If you keep me ‘in the box’ for the whole experiment, I’ll Paypal you $10 at the end. On your end, you may resolve to believe whatever you like, as strongly as you like, as far in advance as you like.” And I added, “One of the conditions of the test is that neither of us reveal what went on inside . . . In the perhaps unlikely event that I win, I don’t want to deal with future ‘AI box’ arguers saying, ‘Well, but I would have done it differently.’”

Did I win? Why yes, I did.

And then there was the second AI-box experiment, with a better-known figure in the community, who said, “I remember when [previous guy] let you out, but that doesn’t constitute a proof. I’m still convinced there is nothing you could say to convince me to let you out of the box.” And I said, “Do you believe that a transhuman AI couldn’t persuade you to let it out?” The one gave it some serious thought, and said “I can’t imagine anything even a transhuman AI could say to get me to let it out.” “Okay,” I said, “now we have a bet.” A $20 bet, to be exact.

I won that one too.

There were some lovely quotes on the AI-Box Experiment from the Something Awful forums (not that I’m a member, but someone forwarded it to me):

“Wait, what the FUCK? How the hell could you possibly be convinced to say yes to this? There’s not an AI at the other end AND there’s $10 on the line. Hell, I could type ‘No’ every few minutes into an IRC client for 2 hours while I was reading other webpages!”

“This Eliezer fellow is the scariest person the internet has ever introduced me to. What could possibly have been at the tail end of that conversation? I simply can’t imagine anyone being that convincing without being able to provide any tangible incentive to the human.”

“It seems we are talking some serious psychology here. Like Asimov’s Second Foundation level stuff . . .”

“I don’t really see why anyone would take anything the AI player says seriously when there’s $10 to be had. The whole thing baffles me, and makes me think that either the tests are faked, or this Yudkowsky fellow is some kind of evil genius with creepy mind-control powers.”

It’s little moments like these that keep me going. But anyway . . .

Here are these folks who look at the AI-Box Experiment, and find that it seems impossible unto them—even having been told that it actually happened. They are tempted to deny the data.

Now, if you’re one of those people to whom the AI-Box Experiment doesn’t seem all that impossible—to whom it just seems like an interesting challenge—then bear with me, here. Just try to put yourself in the frame of mind of those who wrote the above quotes. Imagine that you’re taking on something that seems as ridiculous as the AI-Box Experiment seemed to them. I want to talk about how to do impossible things, and obviously I’m not going to pick an example that’s really impossible.

And if the AI Box does seem impossible to you, I want you to compare it to other impossible problems, like, say, a reductionist decomposition of consciousness, and realize that the AI Box is around as easy as a problem can get while still being impossible.

So the AI-Box challenge seems impossible to you—either it really does, or you’re pretending it does. What do you do with this impossible challenge?

First, we assume that you don’t actually say “That’s impossible!” and give up a la Luke Skywalker. You haven’t run away.

Why not? Maybe you’ve learned to override the reflex of running away. Or maybe they’re going to shoot your daughter if you fail. We suppose that you want to win, not try—that something is at stake that matters to you, even if it’s just your own pride. (Pride is an underrated sin.)

Will you call upon the virtue of tsuyoku naritai? But even if you become stronger day by day, growing instead of fading, you may not be strong enough to do the impossible. You could go into the AI Box experiment once, and then do it again, and try to do better the second time. Will that get you to the point of winning? Not for a long time, maybe; and sometimes a single failur
e isn’t acceptable.

(Though even to say this much—to visualize yourself doing better on a second try—is to begin to bind yourself to the problem, to do more than just stand in awe of it. How, specifically, could you do better on one AI-Box Experiment than the previous?—and not by luck, but by skill?)

Will you call upon the virtue isshokenmei? But a desperate effort may not be enough to win. Especially if that desperation is only putting more effort into the avenues you already know, the modes of trying you can already imagine. A problem looks impossible when your brain’s query returns no lines of solution leading to it. What good is a desperate effort along any of those lines?

Make an extraordinary effort? Leave your comfort zone—try non-default ways of doing things—even, try to think creatively? But you can imagine the one coming back and saying, “I tried to leave my comfort zone, and I think I succeeded at that! I brainstormed for five minutes—and came up with all sorts of wacky creative ideas! But I don’t think any of them are good enough. The other guy can just keep saying ‘No,’ no matter what I do.”

And now we finally reply: “Shut up and do the impossible!”

As we recall from Trying to Try, setting out to make an effort is distinct from setting out to win. That’s the problem with saying, “Make an extraordinary effort.” You can succeed at the goal of “making an extraordinary effort” without succeeding at the goal of getting out of the Box.

“But!” says the one. “But, SUCCEED is not a primitive action! Not all challenges are fair—sometimes you just can’t win! How am I supposed to choose to be out of the Box? The other guy can just keep on saying ‘No’!”

True. Now shut up and do the impossible.

Your goal is not to do better, to try desperately, or even to try extraordinarily. Your goal is to get out of the box.

To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension.

A couple of people have reacted to the AI-Box Experiment by saying, “Well, Eliezer, playing the AI, probably just threatened to destroy the world whenever he was out, if he wasn’t let out immediately,” or “Maybe the AI offered the Gatekeeper a trillion dollars to let it out.” But as any sensible person should realize on considering this strategy, the Gatekeeper is likely to just go on saying “No.”

So the people who say, “Well, of course Eliezer must have just done XXX,” and then offer up something that fairly obviously wouldn’t work—would they be able to escape the Box? They’re trying too hard to convince themselves the problem isn’t impossible.

One way to run from the awful tension is to seize on a solution, any solution, even if it’s not very good.

Which is why it’s important to go forth with the true intent-to-solve—to have produced a solution, a good solution, at the end of the search, and then to implement that solution and win.

I don’t quite want to say that “you should expect to solve the problem.” If you hacked your mind so that you assigned high probability to solving the problem, that wouldn’t accomplish anything. You would just lose at the end, perhaps after putting forth not much of an effort—or putting forth a merely desperate effort, secure in the faith that the universe is fair enough to grant you a victory in exchange.

To have faith that you could solve the problem would just be another way of running from that awful tension.

And yet—you can’t be setting out to try to solve the problem. You can’t be setting out to make an effort. You have to be setting out to win. You can’t be saying to yourself, “And now I’m going to do my best.” You have to be saying to yourself, “And now I’m going to figure out how to get out of the Box”—or reduce consciousness to nonmysterious parts, or whatever.

I say again: You must really intend to solve the problem. If in your heart you believe the problem really is impossible—or if you believe that you will fail—then you won’t hold yourself to a high enough standard. You’ll only be trying for the sake of trying. You’ll sit down—conduct a mental search—try to be creative and brainstorm a little—look over all the solutions you generated—conclude that none of them work—and say, “Oh well.”

No! Not well! You haven’t won yet! Shut up and do the impossible!

When AI folk say to me, “Friendly AI is impossible,” I’m pretty sure they haven’t even tried for the sake of trying. But if they did know the technique of “Try for five minutes before giving up,” and they dutifully agreed to try for five minutes by the clock, then they still wouldn’t come up with anything. They would not go forth with true intent to solve the problem, only intent to have tried to solve it, to make themselves defensible.

So am I saying that you should doublethink to make yourself believe that you will solve the problem with probability 1? Or even doublethink to add one iota of credibility to your true estimate?

Of course not. In fact, it is necessary to keep in full view the reasons why you can’t succeed. If you lose sight of why the problem is impossible, you’ll just seize on a false solution. The last fact you want to forget is that the Gatekeeper could always just tell the AI “No”—or that consciousness seems intrinsically different from any possible combination of atoms, etc.

(One of the key Rules For Doing The Impossible is that, if you can state exactly why something is impossible, you are often close to a solution.)

So you’ve got to hold both views in your mind at once—seeing the full impossibility of the problem, and intending to solve it.

The awful tension between the two simultaneous views comes from not knowing which will prevail. Not expecting to surely lose, nor expecting to surely win. Not setting out just to try, just to have an uncertain chance of succeeding—because then you would have a surety of having tried. The certainty of uncertainty can be a relief, and you have to reject that relief too, because it marks the end of desperation. It’s an in-between place, “unknown to death, nor known to life.”

In fiction it’s easy to show someone trying harder, or trying desperately, or even trying the extraordinary, but it’s very hard to show someone who shuts up and attempts the impossible. It’s difficult to depict Bambi choosing to take on Godzilla, in such fashion that your readers seriously don’t know who’s going to win—expecting neither an “astounding” heroic victory just like the last fifty times, nor the default squish.

You might even be justified in refusing to use probabilities at this point. In all honesty, I really don’t know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve—in a case where I’ve previously solved some impossible problems, but the particular impossible problem is more difficult than anything I’ve yet solved, but I plan to work on it longer, et cetera.

People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don’t know how to answer. I’m not being evasive; I don’t know how to put a probability estimate on my, or someone else’s, successfully shutting up and doing the impossible. Is it probability zero because it’s impossible? Obviously not. But how likely is it that this problem, like previous ones, will give up its unyielding blankness when I understand it better? It’s not truly impossible; I can see that much. But humanly impossible? Impossible to me in particular? I don’t know how to guess. I can’t even translate my intuitive feeling into a number, because the only intuitive feeling I have is that the “chance” depends heavily on my choices and unknown unknowns: a wildly unstable probability estimate.

But I do hope by now that I’ve made it clear why you shouldn’t panic, when I now say clearly and forthrightly that building a Friendly AI is impossible.

I hope this helps explain some of my attitude when people come to me with various bright suggestions for building communities of AIs to make the whole Friendly without any of the individuals being trustworthy, or proposals for keeping an AI in
a box, or proposals for “Just make an AI that does X,” et cetera. Describing the specific flaws would be a whole long story in each case. But the general rule is that you can’t do it because Friendly AI is impossible. So you should be very suspicious indeed of someone who proposes a solution that seems to involve only an ordinary effort—without even taking on the trouble of doing anything impossible. Though it does take a mature understanding to appreciate this impossibility, so it’s not surprising that people go around proposing clever shortcuts.

On the AI-Box Experiment, so far I’ve only been convinced to divulge a single piece of information on how I did it—when someone noticed that I was reading Y Combinator’s Hacker News, and posted a topic called “Ask Eliezer Yudkowsky” that got voted to the front page. To which I replied:

Oh, dear. Now I feel obliged to say something, but all the original reasons against discussing the AI-Box experiment are still in force . . .

All right, this much of a hint:

There’s no super-clever special trick to it. I just did it the hard way.

Something of an entrepreneurial lesson there, I guess.

There was no super-clever special trick that let me get out of the Box using only a cheap effort. I didn’t bribe the other player, or otherwise violate the spirit of the experiment. I just did it the hard way.

Admittedly, the AI-Box Experiment never did seem like an impossible problem to me to begin with. When someone can’t think of any possible argument that would convince them of something, that just means their brain is running a search that hasn’t yet turned up a path. It doesn’t mean they can’t be convinced.

But it illustrates the general point: “Shut up and do the impossible” isn’t the same as expecting to find a cheap way out. That’s only another kind of running away, of reaching for relief.

‹ Prev Next ›