Rationality- From AI to Zombies

Page 35

by Eliezer Yudkowsky

Neither did he, but on long walks through the streets of town he thought about it and concluded she was evidently stopped with the same kind of blockage that had paralyzed him on his first day of teaching. She was blocked because she was trying to repeat, in her writing, things she had already heard, just as on the first day he had tried to repeat things he had already decided to say. She couldn’t think of anything to write about Bozeman because she couldn’t recall anything she had heard worth repeating. She was strangely unaware that she could look and see freshly for herself, as she wrote, without primary regard for what had been said before. The narrowing down to one brick destroyed the blockage because it was so obvious she had to do some original and direct seeing.

—Robert M. Pirsig,

Zen and the Art of Motorcycle Maintenance

*

1. Pirsig, Zen and the Art of Motorcycle Maintenance.

93

Stranger than History

Suppose I told you that I knew for a fact that the following statements were true:

If you paint yourself a certain exact color between blue and green, it will reverse the force of gravity on you and cause you to fall upward.

In the future, the sky will be filled by billions of floating black spheres. Each sphere will be larger than all the zeppelins that have ever existed put together. If you offer a sphere money, it will lower a male prostitute out of the sky on a bungee cord.

Your grandchildren will think it is not just foolish, but evil, to put thieves in jail instead of spanking them.

You’d think I was crazy, right?

Now suppose it were the year 1901, and you had to choose between believing those statements I have just offered, and believing statements like the following:

There is an absolute speed limit on how fast two objects can seem to be traveling relative to each other, which is exactly 670,616,629.2 miles per hour. If you hop on board a train going almost this fast and fire a gun out the window, the fundamental units of length change around, so it looks to you like the bullet is speeding ahead of you, but other people see something different. Oh, and time changes around too.

In the future, there will be a superconnected global network of billions of adding machines, each one of which has more power than all pre-1901 adding machines put together. One of the primary uses of this network will be to transport moving pictures of lesbian sex by pretending they are made out of numbers.

Your grandchildren will think it is not just foolish, but evil, to say that someone should not be President of the United States because she is black.

Based on a comment of Robin Hanson’s: “I wonder if one could describe in enough detail a fictional story of an alternative reality, a reality that our ancestors could not distinguish from the truth, in order to make it very clear how surprising the truth turned out to be.”

*

94

The Logical Fallacy of Generalization from Fictional Evidence

When I try to introduce the subject of advanced AI, what’s the first thing I hear, more than half the time?

“Oh, you mean like the Terminator movies / The Matrix / Asimov’s robots!”

And I reply, “Well, no, not exactly. I try to avoid the logical fallacy of generalizing from fictional evidence.”

Some people get it right away, and laugh. Others defend their use of the example, disagreeing that it’s a fallacy.

What’s wrong with using movies or novels as starting points for the discussion? No one’s claiming that it’s true, after all. Where is the lie, where is the rationalist sin? Science fiction represents the author’s attempt to visualize the future; why not take advantage of the thinking that’s already been done on our behalf, instead of starting over?

Not every misstep in the precise dance of rationality consists of outright belief in a falsehood; there are subtler ways to go wrong.

First, let us dispose of the notion that science fiction represents a full-fledged rational attempt to forecast the future. Even the most diligent science fiction writers are, first and foremost, storytellers; the requirements of storytelling are not the same as the requirements of forecasting. As Nick Bostrom points out:1

When was the last time you saw a movie about humankind suddenly going extinct (without warning and without being replaced by some other civilization)? While this scenario may be much more probable than a scenario in which human heroes successfully repel an invasion of monsters or robot warriors, it wouldn’t be much fun to watch.

So there are specific distortions in fiction. But trying to correct for these specific distortions is not enough. A story is never a rational attempt at analysis, not even with the most diligent science fiction writers, because stories don’t use probability distributions. I illustrate as follows:

Bob Merkelthud slid cautiously through the door of the alien spacecraft, glancing right and then left (or left and then right) to see whether any of the dreaded Space Monsters yet remained. At his side was the only weapon that had been found effective against the Space Monsters, a Space Sword forged of pure titanium with 30% probability, an ordinary iron crowbar with 20% probability, and a shimmering black discus found in the smoking ruins of Stonehenge with 45% probability, the remaining 5% being distributed over too many minor outcomes to list here.

Merklethud (though there’s a significant chance that Susan Wifflefoofer was there instead) took two steps forward or one step back, when a vast roar split the silence of the black airlock! Or the quiet background hum of the white airlock! Although Amfer and Woofi (1997) argue that Merklethud is devoured at this point, Spacklebackle (2003) points out that—

Characters can be ignorant, but the author can’t say the three magic words “I don’t know.” The protagonist must thread a single line through the future, full of the details that lend flesh to the story, from Wifflefoofer’s appropriately futuristic attitudes toward feminism, down to the color of her earrings.

Then all these burdensome details and questionable assumptions are wrapped up and given a short label, creating the illusion that they are a single package.

On problems with large answer spaces, the greatest difficulty is not verifying the correct answer but simply locating it in answer space to begin with. If someone starts out by asking whether or not AIs are gonna put us into capsules like in The Matrix, they’re jumping to a 100-bit proposition, without a corresponding 98 bits of evidence to locate it in the answer space as a possibility worthy of explicit consideration. It would only take a handful more evidence after the first 98 bits to promote that possibility to near-certainty, which tells you something about where nearly all the work gets done.

The “preliminary” step of locating possibilities worthy of explicit consideration includes steps like: Weighing what you know and don’t know, what you can and can’t predict, making a deliberate effort to avoid absurdity bias and widen confidence intervals, pondering which questions are the important ones, trying to adjust for possible Black Swans and think of (formerly) unknown unknowns. Jumping to “The Matrix: Yes or No?” skips over all of this.

Any professional negotiator knows that to control the terms of a debate is very nearly to control the outcome of the debate. If you start out by thinking of The Matrix, it brings to mind marching robot armies defeating humans after a long struggle—not a superintelligence snapping nanotechnological fingers. It focuses on an “Us vs. Them” struggle, directing attention to questions like “Who will win?” and “Who should win?” and “Will AIs really be like that?” It creates a general atmosphere of entertainment, of “What is your amazing vision of the future?”

Lost to the echoing emptiness are: considerations of more than one possible mind design that an “Artificial Intelligence” could implement; the future’s dependence on initial conditions; the power of smarter-than-human intelligence and the argument for its unpredictability; people taking the whole matter seriously and trying to do something about it.

If some insidious corrupter of debates decided t
hat their preferred outcome would be best served by forcing discussants to start out by refuting Terminator, they would have done well in skewing the frame. Debating gun control, the NRA spokesperson does not wish to be introduced as a “shooting freak,” the anti-gun opponent does not wish to be introduced as a “victim disarmament advocate.” Why should you allow the same order of frame-skewing by Hollywood scriptwriters, even accidentally?

Journalists don’t tell me, “The future will be like 2001.” But they ask, “Will the future be like 2001, or will it be like A.I.?” This is just as huge a framing issue as asking “Should we cut benefits for disabled veterans, or raise taxes on the rich?”

In the ancestral environment, there were no moving pictures; what you saw with your own eyes was true. A momentary glimpse of a single word can prime us and make compatible thoughts more available, with demonstrated strong influence on probability estimates. How much havoc do you think a two-hour movie can wreak on your judgment? It will be hard enough to undo the damage by deliberate concentration—why invite the vampire into your house? In Chess or Go, every wasted move is a loss; in rationality, any non-evidential influence is (on average) entropic.

Do movie-viewers succeed in unbelieving what they see? So far as I can tell, few movie viewers act as if they have directly observed Earth’s future. People who watched the Terminator movies didn’t hide in fallout shelters on August 29, 1997. But those who commit the fallacy seem to act as if they had seen the movie events occurring on some other planet; not Earth, but somewhere similar to Earth.

You say, “Suppose we build a very smart AI,” and they say, “But didn’t that lead to nuclear war in The Terminator?” As far as I can tell, it’s identical reasoning, down to the tone of voice, of someone who might say: “But didn’t that lead to nuclear war on Alpha Centauri?” or “Didn’t that lead to the fall of the Italian city-state of Piccolo in the fourteenth century?” The movie is not believed, but it is available. It is treated, not as a prophecy, but as an illustrative historical case. Will history repeat itself? Who knows?

In a recent intelligence explosion discussion, someone mentioned that Vinge didn’t seem to think that brain-computer interfaces would increase intelligence much, and cited Marooned in Realtime and Tunç Blumenthal, who was the most advanced traveller but didn’t seem all that powerful. I replied indignantly, “But Tunç lost most of his hardware! He was crippled!” And then I did a mental double-take and thought to myself: What the hell am I saying.

Does the issue not have to be argued in its own right, regardless of how Vinge depicted his characters? Tunç Blumenthal is not “crippled,” he’s unreal. I could say “Vinge chose to depict Tunç as crippled, for reasons that may or may not have had anything to do with his personal best forecast,” and that would give his authorial choice an appropriate weight of evidence. I cannot say “Tunç was crippled.” There is no was of Tunç Blumenthal.

I deliberately left in a mistake I made, in my first draft of the beginning of this essay: “Others defend their use of the example, disagreeing that it’s a fallacy.” But The Matrix is not an example!

A neighboring flaw is the logical fallacy of arguing from imaginary evidence: “Well, if you did go to the end of the rainbow, you would find a pot of gold—which just proves my point!” (Updating on evidence predicted, but not observed, is the mathematical mirror image of hindsight bias.)

The brain has many mechanisms for generalizing from observation, not just the availability heuristic. You see three zebras, you form the category “zebra,” and this category embodies an automatic perceptual inference. Horse-shaped creatures with white and black stripes are classified as “Zebras,” therefore they are fast and good to eat; they are expected to be similar to other zebras observed.

So people see (moving pictures of) three Borg, their brain automatically creates the category “Borg,” and they infer automatically that humans with brain-computer interfaces are of class “Borg” and will be similar to other Borg observed: cold, uncompassionate, dressing in black leather, walking with heavy mechanical steps. Journalists don’t believe that the future will contain Borg—they don’t believe Star Trek is a prophecy. But when someone talks about brain-computer interfaces, they think, “Will the future contain Borg?” Not, “How do I know computer-assisted telepathy makes people less nice?” Not, “I’ve never seen a Borg and never has anyone else.” Not, “I’m forming a racial stereotype based on literally zero evidence.”

As George Orwell said of cliches:2

What is above all needed is to let the meaning choose the word, and not the other way around . . . When you think of something abstract you are more inclined to use words from the start, and unless you make a conscious effort to prevent it, the existing dialect will come rushing in and do the job for you, at the expense of blurring or even changing your meaning.

Yet in my estimation, the most damaging aspect of using other authors’ imaginations is that it stops people from using their own. As Robert Pirsig said:3

She was blocked because she was trying to repeat, in her writing, things she had already heard, just as on the first day he had tried to repeat things he had already decided to say. She couldn’t think of anything to write about Bozeman because she couldn’t recall anything she had heard worth repeating. She was strangely unaware that she could look and see freshly for herself, as she wrote, without primary regard for what had been said before.

Remembered fictions rush in and do your thinking for you; they substitute for seeing—the deadliest convenience of all.

*

1. Nick Bostrom, “Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards,” Journal of Evolution and Technology 9 (2002), http://www.jetpress.org/volume9/risks.html.

2. Orwell, “Politics and the English Language.”

3. Pirsig, Zen and the Art of Motorcycle Maintenance.

95

The Virtue of Narrowness

What is true of one apple may not be true of another apple; thus more can be said about a single apple than about all the apples in the world.

—The Twelve Virtues of Rationality

Within their own professions, people grasp the importance of narrowness; a car mechanic knows the difference between a carburetor and a radiator, and would not think of them both as “car parts.” A hunter-gatherer knows the difference between a lion and a panther. A janitor does not wipe the floor with window cleaner, even if the bottles look similar to one who has not mastered the art.

Outside their own professions, people often commit the misstep of trying to broaden a word as widely as possible, to cover as much territory as possible. Is it not more glorious, more wise, more impressive, to talk about all the apples in the world? How much loftier it must be to explain human thought in general, without being distracted by smaller questions, such as how humans invent techniques for solving a Rubik’s Cube. Indeed, it scarcely seems necessary to consider specific questions at all; isn’t a general theory a worthy enough accomplishment on its own?

It is the way of the curious to lift up one pebble from among a million pebbles on the shore, and see something new about it, something interesting, something different. You call these pebbles “diamonds,” and ask what might be special about them—what inner qualities they might have in common, beyond the glitter you first noticed. And then someone else comes along and says: “Why not call this pebble a diamond too? And this one, and this one?” They are enthusiastic, and they mean well. For it seems undemocratic and exclusionary and elitist and unholistic to call some pebbles “diamonds,” and others not. It seems . . . narrow-minded . . . if you’ll pardon the phrase. Hardly open, hardly embracing, hardly communal.

You might think it poetic, to give one word many meanings, and thereby spread shades of connotation all around. But even poets, if they are good poets, must learn to see the world precisely. It is not enough to compare love to a flower. Hot jealous unconsummated love is not the same as the love of a couple married for decade
s. If you need a flower to symbolize jealous love, you must go into the garden, and look, and make subtle distinctions—find a flower with a heady scent, and a bright color, and thorns. Even if your intent is to shade meanings and cast connotations, you must keep precise track of exactly which meanings you shade and connote.

It is a necessary part of the rationalist’s art—or even the poet’s art!—to focus narrowly on unusual pebbles which possess some special quality. And look at the details which those pebbles—and those pebbles alone!—share among each other. This is not a sin.

It is perfectly all right for modern evolutionary biologists to explain just the patterns of living creatures, and not the “evolution” of stars or the “evolution” of technology. Alas, some unfortunate souls use the same word “evolution” to cover the naturally selected patterns of replicating life, and the strictly accidental structure of stars, and the intelligently configured structure of technology. And as we all know, if people use the same word, it must all be the same thing. We should automatically generalize anything we think we know about biological evolution to technology. Anyone who tells us otherwise must be a mere pointless pedant. It couldn’t possibly be that our ignorance of modern evolutionary theory is so total that we can’t tell the difference between a carburetor and a radiator. That’s unthinkable. No, the other person—you know, the one who’s studied the math—is just too dumb to see the connections.

‹ Prev Next ›