Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 124
Rationality- From AI to Zombies Page 124

by Eliezer Yudkowsky


  The first great failure of those who try to consider Friendly AI is the One Great Moral Principle That Is All We Need To Program—a.k.a. the fake utility function—and of this I have already spoken.

  But the even worse failure is the One Great Moral Principle We Don’t Even Need To Program Because Any AI Must Inevitably Conclude It. This notion exerts a terrifying unhealthy fascination on those who spontaneously reinvent it; they dream of commands that no sufficiently advanced mind can disobey. The gods themselves will proclaim the rightness of their philosophy! (E.g., John C. Wright, Marc Geddes.)

  There is also a less severe version of the failure, where the one does not declare the One True Morality. Rather the one hopes for an AI created perfectly free, unconstrained by flawed humans desiring slaves, so that the AI may arrive at virtue of its own accord—virtue undreamed-of perhaps by the speaker, who confesses themselves too flawed to teach an AI. (E.g., John K. Clark, Richard Hollerith?, Eliezer1996.) This is a less tainted motive than the dream of absolute command. But though this dream arises from virtue rather than vice, it is still based on a flawed understanding of freedom, and will not actually work in real life. Of this, more to follow, of course.

  John C. Wright, who was previously writing a very nice transhumanist trilogy (first book: The Golden Age), inserted a huge Author Filibuster in the middle of his climactic third book, describing in tens of pages his Universal Morality That Must Persuade Any AI. I don’t know if anything happened after that, because I stopped reading. And then Wright converted to Christianity—yes, seriously. So you really don’t want to fall into this trap!

  *

  1. Just kidding.

  267

  Created Already in Motion

  Lewis Carroll, who was also a mathematician, once wrote a short dialogue called “What the Tortoise said to Achilles.” If you have not yet read this ancient classic, consider doing so now.

  The Tortoise offers Achilles a step of reasoning drawn from Euclid’s First Proposition:

  (A) Things that are equal to the same are equal to each other.

  (B) The two sides of this Triangle are things that are equal to the same.

  (Z) The two sides of this Triangle are equal to each other.

  Tortoise: “And if some reader had not yet accepted A and B as true, he might still accept the sequence as a valid one, I suppose?”

  Achilles: “No doubt such a reader might exist. He might say, ‘I accept as true the Hypothetical Proposition that, if A and B be true, Z must be true; but, I don’t accept A and B as true.’ Such a reader would do wisely in abandoning Euclid, and taking to football.”

  Tortoise: “And might there not also be some reader who would say, ‘I accept A and B as true, but I don’t accept the Hypothetical’?”

  Achilles, unwisely, concedes this; and so asks the Tortoise to accept another proposition:

  (C) If A and B are true, Z must be true.

  But, asks, the Tortoise, suppose that he accepts A and B and C, but not Z?

  Then, says, Achilles, he must ask the Tortoise to accept one more hypothetical:

  (D) If A and B and C are true, Z must be true.

  Douglas Hofstadter paraphrased the argument some time later:

  ACHILLES: “If you have [(A and B) → Z], and you also have (A and B), then surely you have Z.”

  TORTOISE: “Oh! You mean ((A and B) and [(A and B) → Z]) → Z, don’t you?”

  As Hofstadter says, “Whatever Achilles considers a rule of inference, the Tortoise immediately flattens into a mere string of the system. If you use only the letters A, B, and Z, you will get a recursive pattern of longer and longer strings.”

  This is the anti-pattern I call Passing the Recursive Buck; and though the counterspell is sometimes hard to find, when found, it generally takes the form The Buck Stops Immediately.

  The Tortoise’s mind needs the dynamic of adding Y to the belief pool when X and (X → Y) are previously in the belief pool. If this dynamic is not present—a rock, for example, lacks it—then you can go on adding in X and (X → Y) and ((X and (X → Y)) → Y) until the end of eternity, without ever getting to Y.

  The phrase that once came into my mind to describe this requirement is that a mind must be created already in motion. There is no argument so compelling that it will give dynamics to a static thing. There is no computer program so persuasive that you can run it on a rock.

  And even if you have a mind that does carry out modus ponens, it is futile for it to have such beliefs as . . .

  (A) If a toddler is on the train tracks, then pulling them off is fuzzle.

  (B) There is a toddler on the train tracks.

  . . . unless the mind also implements:

  Dynamic: When the belief pool contains “X is fuzzle,” send X to the action system.

  By “dynamic” I mean a property of a physically implemented cognitive system’s development over time. A “dynamic” is something that happens inside a cognitive system, not data that it stores in memory and manipulates. Dynamics are the manipulations. There is no way to write a dynamic on a piece of paper, because the paper will just lie there. So the text immediately above, which says “dynamic,” is not dynamic. If I wanted the text to be dynamic and not just say “dynamic,” I would have to write a Java applet.

  Needless to say, having the belief . . .

  (C) If the belief pool contains “X is fuzzle,” then “send ‘X’ to the action system” is fuzzle.

  . . . won’t help unless the mind already implements the behavior of translating hypothetical actions labeled “fuzzle” into actual motor actions.

  By dint of careful arguments about the nature of cognitive systems, you might be able to prove . . .

  (D) A mind with a dynamic that sends plans labeled “fuzzle” to the action system is more fuzzle than minds that don’t.

  . . . but that still won’t help, unless the listening mind previously possessed the dynamic of swapping out its current source code for alternative source code that is believed to be more fuzzle.

  This is why you can’t argue fuzzleness into a rock.

  *

  268

  Sorting Pebbles into Correct Heaps

  Once upon a time there was a strange little species—that might have been biological, or might have been synthetic, and perhaps were only a dream—whose passion was sorting pebbles into correct heaps.

  They couldn’t tell you why some heaps were correct, and some incorrect. But all of them agreed that the most important thing in the world was to create correct heaps, and scatter incorrect ones.

  Why the Pebblesorting People cared so much, is lost to this history—maybe a Fisherian runaway sexual selection, started by sheer accident a million years ago? Or maybe a strange work of sentient art, created by more powerful minds and abandoned?

  But it mattered so drastically to them, this sorting of pebbles, that all the Pebblesorting philosophers said in unison that pebble-heap-sorting was the very meaning of their lives: and held that the only justified reason to eat was to sort pebbles, the only justified reason to mate was to sort pebbles, the only justified reason to participate in their world economy was to efficiently sort pebbles.

  The Pebblesorting People all agreed on that, but they didn’t always agree on which heaps were correct or incorrect.

  In the early days of Pebblesorting civilization, the heaps they made were mostly small, with counts like 23 or 29; they couldn’t tell if larger heaps were correct or not. Three millennia ago, the Great Leader Biko made a heap of 91 pebbles and proclaimed it correct, and his legions of admiring followers made more heaps likewise. But over a handful of centuries, as the power of the Bikonians faded, an intuition began to accumulate among the smartest and most educated that a heap of 91 pebbles was incorrect. Until finally they came to know what they had done: and they scattered all the heaps of 91 pebbles. Not without flashes of regret, for some of those heaps were great works of art, but incorrect. They even scattered Biko’s original heap, made of 91
precious gemstones each of a different type and color.

  And no civilization since has seriously doubted that a heap of 91 is incorrect.

  Today, in these wiser times, the size of the heaps that Pebblesorters dare attempt has grown very much larger—which all agree would be a most great and excellent thing, if only they could ensure the heaps were really correct. Wars have been fought between countries that disagree on which heaps are correct: the Pebblesorters will never forget the Great War of 1957, fought between Y’ha-nthlei and Y’not’ha-nthlei, over heaps of size 1957. That war, which saw the first use of nuclear weapons on the Pebblesorting Planet, finally ended when the Y’not’ha-nthleian philosopher At’gra’len’ley exhibited a heap of 103 pebbles and a heap of 19 pebbles side-by-side. So persuasive was this argument that even Y’ha-nthlei reluctantly conceded that it was best to stop building heaps of 1957 pebbles, at least for the time being.

  Since the Great War of 1957, countries have been reluctant to openly endorse or condemn heaps of large size, since this leads so easily to war. Indeed, some Pebblesorting philosophers—who seem to take a tangible delight in shocking others with their cynicism—have entirely denied the existence of pebble-sorting progress; they suggest that opinions about pebbles have simply been a random walk over time, with no coherence to them, the illusion of progress created by condemning all dissimilar pasts as incorrect. The philosophers point to the disagreement over pebbles of large size, as proof that there is nothing that makes a heap of size 91 really incorrect—that it was simply fashionable to build such heaps at one point in time, and then at another point, fashionable to condemn them. “But . . . 13!” carries no truck with them; for to regard “13!” as a persuasive counterargument is only another convention, they say. The Heap Relativists claim that their philosophy may help prevent future disasters like the Great War of 1957, but it is widely considered to be a philosophy of despair.

  Now the question of what makes a heap correct or incorrect has taken on new urgency; for the Pebblesorters may shortly embark on the creation of self-improving Artificial Intelligences. The Heap Relativists have warned against this project: They say that AIs, not being of the species Pebblesorter sapiens, may form their own culture with entirely different ideas of which heaps are correct or incorrect. “They could decide that heaps of 8 pebbles are correct,” say the Heap Relativists, “and while ultimately they’d be no righter or wronger than us, still, our civilization says we shouldn’t build such heaps. It is not in our interest to create AI, unless all the computers have bombs strapped to them, so that even if the AI thinks a heap of 8 pebbles is correct, we can force it to build heaps of 7 pebbles instead. Otherwise, KABOOM!”

  But this, to most Pebblesorters, seems absurd. Surely a sufficiently powerful AI—especially the “superintelligence” some transpebblesorterists go on about—would be able to see at a glance which heaps were correct or incorrect! The thought of something with a brain the size of a planet thinking that a heap of 8 pebbles was correct is just too absurd to be worth talking about.

  Indeed, it is an utterly futile project to constrain how a superintelligence sorts pebbles into heaps. Suppose that Great Leader Biko had been able, in his primitive era, to construct a self-improving AI; and he had built it as an expected utility maximizer whose utility function told it to create as many heaps as possible of size 91. Surely, when this AI improved itself far enough, and became smart enough, then it would see at a glance that this utility function was incorrect; and, having the ability to modify its own source code, it would rewrite its utility function to value more reasonable heap sizes, like 101 or 103.

  And certainly not heaps of size 8. That would just be stupid. Any mind that stupid is too dumb to be a threat.

  Reassured by such common sense, the Pebblesorters pour full speed ahead on their project to throw together lots of algorithms at random on big computers until some kind of intelligence emerges. The whole history of civilization has shown that richer, smarter, better educated civilizations are likely to agree about heaps that their ancestors once disputed. Sure, there are then larger heaps to argue about—but the further technology has advanced, the larger the heaps that have been agreed upon and constructed.

  Indeed, intelligence itself has always correlated with making correct heaps—the nearest evolutionary cousins to the Pebblesorters, the Pebpanzees, make heaps of only size 2 or 3, and occasionally stupid heaps like 9. And other, even less intelligent creatures, like fish, make no heaps at all.

  Smarter minds equal smarter heaps. Why would that trend break?

  *

  269

  2-Place and 1-Place Words

  I have previously spoken of the ancient, pulp-era magazine covers that showed a bug-eyed monster carrying off a girl in a torn dress; and about how people think as if sexiness is an inherent property of a sexy entity, without dependence on the admirer.

  “Of course the bug-eyed monster will prefer human females to its own kind,” says the artist (who we’ll call Fred); “it can see that human females have soft, pleasant skin instead of slimy scales. It may be an alien, but it’s not stupid—why are you expecting it to make such a basic mistake about sexiness?”

  What is Fred’s error? It is treating a function of 2 arguments (“2-place function”):

  Sexiness: Admirer, Entity → [0,∞),

  as though it were a function of 1 argument (“1-place function”):

  Sexiness: Entity → [0,∞).

  If Sexiness is treated as a function that accepts only one Entity as its argument, then of course Sexiness will appear to depend only on the Entity, with nothing else being relevant.

  When you think about a two-place function as though it were a one-place function, you end up with a Variable Question Fallacy / Mind Projection Fallacy. Like trying to determine whether a building is intrinsically on the left or on the right side of the road, independent of anyone’s travel direction.

  An alternative and equally valid standpoint is that “sexiness” does refer to a one-place function—but each speaker uses a different one-place function to decide who to kidnap and ravish. Who says that just because Fred, the artist, and Bloogah, the bug-eyed monster, both use the word “sexy,” they must mean the same thing by it?

  If you take this viewpoint, there is no paradox in speaking of some woman intrinsically having 5 units of Fred::Sexiness. All onlookers can agree on this fact, once Fred::Sexiness has been specified in terms of curves, skin texture, clothing, status cues, etc. This specification need make no mention of Fred, only the woman to be evaluated.

  It so happens that Fred, himself, uses this algorithm to select flirtation targets. But that doesn’t mean the algorithm itself has to mention Fred. So Fred’s Sexiness function really is a function of one argument—the woman—on this view. I called it Fred::Sexiness, but remember that this name refers to a function that is being described independently of Fred. Maybe it would be better to write:

  Fred::Sexiness == Sexiness_20934.

  It is an empirical fact about Fred that he uses the function Sexiness_20934 to evaluate potential mates. Perhaps John uses exactly the same algorithm; it doesn’t matter where it comes from once we have it.

  And similarly, the same woman has only 0.01 units of Sexiness_72546, whereas a slime mold has 3 units of Sexiness_72546. It happens to be an empirical fact that Bloogah uses Sexiness_72546 to decide who to kidnap; that is, Bloogah::Sexiness names the fixed Bloogah-independent mathematical object that is the function Sexiness_72546.

  Once we say that the woman has 0.01 units of Sexiness_72546 and 5 units of Sexiness_20934, all observers can agree on this without paradox.

  And the two 2-place and 1-place views can be unified using the concept of “currying,” named after the mathematician Haskell Curry. Currying is a technique allowed in certain programming languages, where e.g. instead of writing

  x = plus(2, 3) (x = 5),

  you can also write

  y = plus(2)

  (y is now a “curried�
�� form of the function plus, which has eaten a 2)

  x = y(3) (x=5)

  z = y(7) (z=9).

  So plus is a 2-place function, but currying plus—letting it eat only one of its two required arguments—turns it into a 1-place function that adds 2 to any input. (Similarly, you could start with a 7-place function, feed it 4 arguments, and the result would be a 3-place function, etc.)

  A true purist would insist that all functions should be viewed, by definition, as taking exactly one argument. On this view, plus accepts one numeric input, and outputs a new function; and this new function has one numeric input and finally outputs a number. On this view, when we write plus(2, 3) we are really computing plus(2) to get a function that adds 2 to any input, and then applying the result to 3. A programmer would write this as:

  plus: int → (int → int).

  This says that plus takes an int as an argument, and returns a function of type int → int.

  Translating the metaphor back into the human use of words, we could imagine that “sexiness” starts by eating an Admirer, and spits out the fixed mathematical object that describes how the Admirer currently evaluates pulchritude. It is an empirical fact about the Admirer that their intuitions of desirability are computed in a way that is isomorphic to this mathematical function.

  Then the mathematical object spit out by currying Sexiness(Admirer) can be applied to the Woman. If the Admirer was originally Fred, Sexiness(Fred) will first return Sexiness_20934. We can then say it is an empirical fact about the Woman, independently of Fred, that Sexiness_20934(Woman) = 5.

  In Hilary Putnam’s “Twin Earth” thought experiment, there was a tremendous philosophical brouhaha over whether it makes sense to postulate a Twin Earth that is just like our own, except that instead of water being H2O, water is a different transparent flowing substance, XYZ. And furthermore, set the time of the thought experiment a few centuries ago, so in neither our Earth nor the Twin Earth does anyone know how to test the alternative hypotheses of H2O vs. XYZ. Does the word “water” mean the same thing in that world as in this one?

 

‹ Prev