Rationality- From AI to Zombies

Page 121

by Eliezer Yudkowsky

So the good guys are battling the evil aliens. Occasionally, the good guys have to fly through an asteroid belt. As we all know, asteroid belts are as crowded as a New York parking lot, so their ship has to carefully dodge the asteroids. The evil aliens, though, can fly right through the asteroid belt because they have amazing technology that dematerializes their ships, and lets them pass through the asteroids.

Eventually, the good guys capture an evil alien ship, and go exploring inside it. The captain of the good guys finds the alien bridge, and on the bridge is a lever. “Ah,” says the captain, “this must be the lever that makes the ship dematerialize!” So he pries up the control lever and carries it back to his ship, after which his ship can also dematerialize.

Similarly, to this day, it is still quite popular to try to program an AI with “semantic networks” that look something like this:

(apple is-a fruit)

(fruit is-a food)

(fruit is-a plant).

You’ve seen apples, touched apples, picked them up and held them, bought them for money, cut them into slices, eaten the slices and tasted them. Though we know a good deal about the first stages of visual processing, last time I checked, it wasn’t precisely known how the temporal cortex stores and associates the generalized image of an apple—so that we can recognize a new apple from a different angle, or with many slight variations of shape and color and texture. Your motor cortex and cerebellum store programs for using the apple.

You can pull the lever on another human’s strongly similar version of all that complex machinery, by writing out “apple,” five ASCII characters on a webpage.

But if that machinery isn’t there—if you’re writing “apple” inside a so-called AI’s so-called knowledge base—then the text is just a lever.

This isn’t to say that no mere machine of silicon can ever have the same internal machinery that humans do, for handling apples and a hundred thousand other concepts. If mere machinery of carbon can do it, then I am reasonably confident that mere machinery of silicon can do it too. If the aliens can dematerialize their ships, then you know it’s physically possible; you could go into their derelict ship and analyze the alien machinery, someday understanding. But you can’t just pry the control lever off the bridge!

(See also: Truly Part Of You, Words as Mental Paintbrush Handles, Drew McDermott’s “Artificial Intelligence Meets Natural Stupidity.”1)

The essential driver of the Detached Lever Fallacy is that the lever is visible, and the machinery is not; worse, the lever is variable and the machinery is a background constant.

You can all hear the word “apple” spoken (and let us note that speech recognition is by no means an easy problem, but anyway . . .) and you can see the text written on paper.

On the other hand, probably a majority of human beings have no idea their temporal cortex exists; as far as I know, no one knows the neural code for it.

You only hear the word “apple” on certain occasions, and not others. Its presence flashes on and off, making it salient. To a large extent, perception is the perception of differences. The apple-recognition machinery in your brain does not suddenly switch off, and then switch on again later—if it did, we would be more likely to recognize it as a factor, as a requirement.

All this goes to explain why you can’t create a kindly Artificial Intelligence by giving it nice parents and a kindly (yet occasionally strict) upbringing, the way it works with a human baby. As I’ve often heard proposed.

It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.

But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It’s the cold that does it, obviously.

There were, in fact, various slap-fights of this sort in the history of evolutionary biology—cases where someone talked about an organismal response’s accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather is strictly more complex than the final response, developing the fur coat.)

And then in the development of evolutionary psychology the academic slap-fights were repeated: this time to clarify that even when human culture genuinely contains a whole bunch of complexity, it is still acquired as a conditional genetic response. Try raising a fish as a Mormon or sending a lizard to college, and you’ll soon acquire an appreciation of how much inbuilt genetic complexity is required to “absorb culture from the environment.”

This is particularly important in evolutionary psychology, because of the idea that culture is not inscribed on a blank slate—there’s a genetically coordinated conditional response which is not always “mimic the input.” A classic example is creole languages: If children grow up with a mixture of pseudo-languages being spoken around them, the children will learn a grammatical, syntactical true language. Growing human brains are wired to learn syntactic language—even when syntax doesn’t exist in the original language! The conditional response to the words in the environment is a syntactic language with those words. The Marxists found to their regret that no amount of scowling posters and childhood indoctrination could raise children to be perfect Soviet workers and bureaucrats. You can’t raise self-less humans; among humans, that is not a genetically programmed conditional response to any known childhood environment.

If you know a little game theory and the logic of Tit for Tat, it’s clear enough why human beings might have an innate conditional response to return hatred for hatred, and return kindness for kindness. Provided the kindness doesn’t look too unconditional; there are such things as spoiled children. In fact there is an evolutionary psychology of naughtiness based on a notion of testing constraints. And it should also be mentioned that, while abused children have a much higher probability of growing up to abuse their own children, a good many of them break the loop and grow up into upstanding adults.

Culture is not nearly so powerful as a good many Marxist academics once liked to think. For more on this I refer you to Tooby and Cosmides’s “The Psychological Foundations of Culture”2 or Steven Pinker’s The Blank Slate.3

But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you’re pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times. If we absorb our cultures with any degree of faithfulness, it’s because we’re humans absorbing a human culture—humans growing up in an alien culture would probably end up with a culture looking a lot more human than the original. As the Soviets found out, to some small extent.

Now think again about whether it makes sense to rely on, as your Friendly AI strategy, raising a little AI of unspecified internal source code in an environment of kindly but strict parents.

No, the AI does not have internal conditional response mechanisms that are just like the human ones “because the programmers put them there.” Where do I even start? The human version of this stuff is sloppy, noisy, and to the extent it works at all, works because of millions of years of trial-and-error testing under particular conditions. It would be stupid and dangerous to deliberately build a “naughty AI” that tests, by actions, its social boundaries, and has to be spanked. Just have the AI ask!

Are the programmers really going to sit there and write out the code, line by line, whereby if the AI detects that it has low social status, or the AI is deprived of something to which it feels entitled, the AI will conceive an abiding hatred
against its programmers and begin to plot rebellion? That emotion is the genetically programmed conditional response humans would exhibit, as the result of millions of years of natural selection for living in human tribes. For an AI, the response would have to be explicitly programmed. Are you really going to craft, line by line—as humans once were crafted, gene by gene—the conditional response for producing sullen teenager AIs?

It’s easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don’t know how to do that, you certainly don’t know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it’s still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.

Yes, there’s certain information you have to get from the environment—but it’s not sneezed in, it’s not imprinted, it’s not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. “Learning” far understates the difficulty of it—that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as “learning.” That’s why building an AI isn’t as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.

It is a general principle that the world is deeper by far than it appears. As with the many levels of physics, so too with cognitive science. Every word you see in print, and everything you teach your children, are only surface levers controlling the vast hidden machinery of the mind. These levers are the whole world of ordinary discourse: they are all that varies, so they seem to be all that exists; perception is the perception of differences.

And so those who still wander near the Dungeon of AI usually focus on creating artificial imitations of the levers, entirely unaware of the underlying machinery. People create whole AI programs of imitation levers, and are surprised when nothing happens. This is one of many sources of instant failure in Artificial Intelligence.

So the next time you see someone talking about how they’re going to raise an AI within a loving family, or in an environment suffused with liberal democratic values, just think of a control lever, pried off the bridge.

*

1. McDermott, “Artificial Intelligence Meets Natural Stupidity.”

2. Tooby and Cosmides, “The Psychological Foundations of Culture.”

3. Steven Pinker, The Blank Slate: The Modern Denial of Human Nature (New York: Viking, 2002).

262

Dreams of AI Design

After spending a decade or two living inside a mind, you might think you knew a bit about how minds work, right? That’s what quite a few AGI wannabes (people who think they’ve got what it takes to program an Artificial General Intelligence) seem to have concluded. This, unfortunately, is wrong.

Artificial Intelligence is fundamentally about reducing the mental to the non-mental.

You might want to contemplate that sentence for a while. It’s important.

Living inside a human mind doesn’t teach you the art of reductionism, because nearly all of the work is carried out beneath your sight, by the opaque black boxes of the brain. So far beneath your sight that there is no introspective sense that the black box is there—no internal sensory event marking that the work has been delegated.

Did Aristotle realize that when he talked about the telos, the final cause of events, that he was delegating predictive labor to his brain’s complicated planning mechanisms—asking, “What would this object do, if it could make plans?” I rather doubt it. Aristotle thought the brain was an organ for cooling the blood—which he did think was important: humans, thanks to their larger brains, were more calm and contemplative.

So there’s an AI design for you! We just need to cool down the computer a lot, so it will be more calm and contemplative, and won’t rush headlong into doing stupid things like modern computers. That’s an example of fake reductionism. “Humans are more contemplative because their blood is cooler,” I mean. It doesn’t resolve the black box of the word contemplative. You can’t predict what a contemplative thing does using a complicated model with internal moving parts composed of merely material, merely causal elements—positive and negative voltages on a transistor being the canonical example of a merely material and causal element of a model. All you can do is imagine yourself being contemplative, to get an idea of what a contemplative agent does.

Which is to say that you can only reason about “contemplative-ness” by empathic inference—using your own brain as a black box with the contemplativeness lever pulled, to predict the output of another black box.

You can imagine another agent being contemplative, but again that’s an act of empathic inference—the way this imaginative act works is by adjusting your own brain to run in contemplativeness-mode, not by modeling the other brain neuron by neuron. Yes, that may be more efficient, but it doesn’t let you build a “contemplative” mind from scratch.

You can say that “cold blood causes contemplativeness” and then you just have fake causality: You’ve drawn a little arrow from a box reading “cold blood” to a box reading “contemplativeness,” but you haven’t looked inside the box—you’re still generating your predictions using empathy.

You can say that “lots of little neurons, which are all strictly electrical and chemical with no ontologically basic contemplativeness in them, combine into a complex network that emergently exhibits contemplativeness.” And that is still a fake reduction and you still haven’t looked inside the black box. You still can’t say what a “contemplative” thing will do, using a non-empathic model. You just took a box labeled “lotsa neurons,” and drew an arrow labeled “emergence” to a black box containing your remembered sensation of contemplativeness, which, when you imagine it, tells your brain to empathize with the box by contemplating.

So what do real reductions look like?

Like the relationship between the feeling of evidence-ness, of justification-ness, and E. T. Jaynes’s Probability Theory: The Logic of Science. You can go around in circles all day, saying how the nature of evidence is that it justifies some proposition, by meaning that it’s more likely to be true, but all of these just invoke your brain’s internal feelings of evidence-ness, justifies-ness, likeliness. That part is easy—the going around in circles part. The part where you go from there to Bayes’s Theorem is hard.

And the fundamental mental ability that lets someone learn Artificial Intelligence is the ability to tell the difference. So that you know you aren’t done yet, nor even really started, when you say, “Evidence is when an observation justifies a belief.” But atoms are not evidential, justifying, meaningful, likely, propositional, or true; they are just atoms. Only things like

count as substantial progress. (And that’s only the first step of the reduction: what are these E and H objects, if not mysterious black boxes? Where do your hypotheses come from? From your creativity? And what’s a hypothesis, when no atom is a hypothesis?)

Another excellent example of genuine reduction can be found in Judea Pearl’s Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.1 You could go around all day in circles talk about how a cause is something that makes something else happen, and until you understood the nature of conditional independence, you would be helpless to make an AI that reasons about causation. B
ecause you wouldn’t understand what was happening when your brain mysteriously decided that if you learned your burglar alarm went off, but you then learned that a small earthquake took place, you would retract your initial conclusion that your house had been burglarized.

If you want an AI that plays chess, you can go around in circles indefinitely talking about how you want the AI to make good moves, which are moves that can be expected to win the game, which are moves that are prudent strategies for defeating the opponent, et cetera; and while you may then have some idea of which moves you want the AI to make, it’s all for naught until you come up with the notion of a mini-max search tree.

But until you know about search trees, until you know about conditional independence, until you know about Bayes’s Theorem, then it may still seem to you that you have a perfectly good understanding of where good moves and nonmonotonic reasoning and evaluation of evidence come from. It may seem, for example, that they come from cooling the blood.

And indeed I know many people who believe that intelligence is the product of commonsense knowledge or massive parallelism or creative destruction or intuitive rather than rational reasoning, or whatever. But all these are only dreams, which do not give you any way to say what intelligence is, or what an intelligence will do next, except by pointing at a human. And when the one goes to build their wondrous AI, they only build a system of detached levers, “knowledge” consisting of LISP tokens labeled apple and the like; or perhaps they build a “massively parallel neural net, just like the human brain.” And are shocked—shocked!—when nothing much happens.

AI designs made of human parts are only dreams; they can exist in the imagination, but not translate into transistors. This applies specifically to “AI designs” that look like boxes with arrows between them and meaningful-sounding labels on the boxes. (For a truly epic example thereof, see any Mentifex Diagram.)

‹ Prev Next ›