Book Read Free

Rationality- From AI to Zombies

Page 56

by Eliezer Yudkowsky


  This is the Great Failure of Imagination. Don’t think that it’s just about science fiction, or even just about AI. The inability to imagine the alien is the inability to see yourself—the inability to understand your own specialness. Who can see a human camouflaged against a human background?

  *

  145

  Optimization and the Intelligence Explosion

  Among the topics I haven’t delved into here is the notion of an optimization process. Roughly, this is the idea that your power as a mind is your ability to hit small targets in a large search space—this can be either the space of possible futures (planning) or the space of possible designs (invention).

  Suppose you have a car, and suppose we already know that your preferences involve travel. Now suppose that you take all the parts in the car, or all the atoms, and jumble them up at random. It’s very unlikely that you’ll end up with a travel-artifact at all, even so much as a wheeled cart; let alone a travel-artifact that ranks as high in your preferences as the original car. So, relative to your preference ordering, the car is an extremely improbable artifact. The power of an optimization process is that it can produce this kind of improbability.

  You can view both intelligence and natural selection as special cases of optimization: processes that hit, in a large search space, very small targets defined by implicit preferences. Natural selection prefers more efficient replicators. Human intelligences have more complex preferences. Neither evolution nor humans have consistent utility functions, so viewing them as “optimization processes” is understood to be an approximation. You’re trying to get at the sort of work being done, not claim that humans or evolution do this work perfectly.

  This is how I see the story of life and intelligence—as a story of improbably good designs being produced by optimization processes. The “improbability” here is improbability relative to a random selection from the design space, not improbability in an absolute sense—if you have an optimization process around, then “improbably” good designs become probable.

  Looking over the history of optimization on Earth up until now, the first step is to conceptually separate the meta level from the object level—separate the structure of optimization from that which is optimized.

  If you consider biology in the absence of hominids, then on the object level we have things like dinosaurs and butterflies and cats. On the meta level we have things like sexual recombination and natural selection of asexual populations. The object level, you will observe, is rather more complicated than the meta level. Natural selection is not an easy subject and it involves math. But if you look at the anatomy of a whole cat, the cat has dynamics immensely more complicated than “mutate, recombine, reproduce.”

  This is not surprising. Natural selection is an accidental optimization process, that basically just started happening one day in a tidal pool somewhere. A cat is the subject of millions of years and billions of years of evolution.

  Cats have brains, of course, which operate to learn over a lifetime; but at the end of the cat’s lifetime, that information is thrown away, so it does not accumulate. The cumulative effects of cat-brains upon the world as optimizers, therefore, are relatively small.

  Or consider a bee brain, or a beaver brain. A bee builds hives, and a beaver builds dams; but they didn’t figure out how to build them from scratch. A beaver can’t figure out how to build a hive, a bee can’t figure out how to build a dam.

  So animal brains—up until recently—were not major players in the planetary game of optimization; they were pieces but not players. Compared to evolution, brains lacked both generality of optimization power (they could not produce the amazing range of artifacts produced by evolution) and cumulative optimization power (their products did not accumulate complexity over time). For more on this theme see Protein Reinforcement and DNA Consequentialism.

  Very recently, certain animal brains have begun to exhibit both generality of optimization power (producing an amazingly wide range of artifacts, in time scales too short for natural selection to play any significant role) and cumulative optimization power (artifacts of increasing complexity, as a result of skills passed on through language and writing).

  Natural selection takes hundreds of generations to do anything and millions of years for de novo complex designs. Human programmers can design a complex machine with a hundred interdependent elements in a single afternoon. This is not surprising, since natural selection is an accidental optimization process that basically just started happening one day, whereas humans are optimized optimizers handcrafted by natural selection over millions of years.

  The wonder of evolution is not how well it works, but that it works at all without being optimized. This is how optimization bootstrapped itself into the universe—starting, as one would expect, from an extremely inefficient accidental optimization process. Which is not the accidental first replicator, mind you, but the accidental first process of natural selection. Distinguish the object level and the meta level!

  Since the dawn of optimization in the universe, a certain structural commonality has held across both natural selection and human intelligence . . .

  Natural selection selects on genes, but generally speaking, the genes do not turn around and optimize natural selection. The invention of sexual recombination is an exception to this rule, and so is the invention of cells and DNA. And you can see both the power and the rarity of such events, by the fact that evolutionary biologists structure entire histories of life on Earth around them.

  But if you step back and take a human standpoint—if you think like a programmer—then you can see that natural selection is still not all that complicated. We’ll try bundling different genes together? We’ll try separating information storage from moving machinery? We’ll try randomly recombining groups of genes? On an absolute scale, these are the sort of bright ideas that any smart hacker comes up with during the first ten minutes of thinking about system architectures.

  Because natural selection started out so inefficient (as a completely accidental process), this tiny handful of meta-level improvements feeding back in from the replicators—nowhere near as complicated as the structure of a cat—structure the evolutionary epochs of life on Earth.

  And after all that, natural selection is still a blind idiot of a god. Gene pools can evolve to extinction, despite all cells and sex.

  Now natural selection does feed on itself in the sense that each new adaptation opens up new avenues of further adaptation; but that takes place on the object level. The gene pool feeds on its own complexity—but only thanks to the protected interpreter of natural selection that runs in the background, and that is not itself rewritten or altered by the evolution of species.

  Likewise, human beings invent sciences and technologies, but we have not yet begun to rewrite the protected structure of the human brain itself. We have a prefrontal cortex and a temporal cortex and a cerebellum, just like the first inventors of agriculture. We haven’t started to genetically engineer ourselves. On the object level, science feeds on science, and each new discovery paves the way for new discoveries—but all that takes place with a protected interpreter, the human brain, running untouched in the background.

  We have meta-level inventions like science, that try to instruct humans in how to think. But the first person to invent Bayes’s Theorem did not become a Bayesian; they could not rewrite themselves, lacking both that knowledge and that power. Our significant innovations in the art of thinking, like writing and science, are so powerful that they structure the course of human history; but they do not rival the brain itself in complexity, and their effect upon the brain is comparatively shallow.

  The present state of the art in rationality training is not sufficient to turn an arbitrarily selected mortal into Albert Einstein, which shows the power of a few minor genetic quirks of brain design compared to all the self-help books ever written in the twentieth century.

  Because the brain hums away invisibly in the background
, people tend to overlook its contribution and take it for granted; and talk as if the simple instruction to “Test ideas by experiment,” or the p < 0.05 significance rule, were the same order of contribution as an entire human brain. Try telling chimpanzees to test their ideas by experiment and see how far you get.

  Now . . . some of us want to intelligently design an intelligence that would be capable of intelligently redesigning itself, right down to the level of machine code.

  The machine code at first, and the laws of physics later, would be a protected level of a sort. But that “protected level” would not contain the dynamic of optimization; the protected levels would not structure the work. The human brain does quite a bit of optimization on its own, and screws up on its own, no matter what you try to tell it in school. But this fully wraparound recursive optimizer would have no protected level that was optimizing. All the structure of optimization would be subject to optimization itself.

  And that is a sea change which breaks with the entire past since the first replicator, because it breaks the idiom of a protected meta level.

  The history of Earth up until now has been a history of optimizers spinning their wheels at a constant rate, generating a constant optimization pressure. And creating optimized products, not at a constant rate, but at an accelerating rate, because of how object-level innovations open up the pathway to other object-level innovations. But that acceleration is taking place with a protected meta level doing the actual optimizing. Like a search that leaps from island to island in the search space, and good islands tend to be adjacent to even better islands, but the jumper doesn’t change its legs. Occasionally, a few tiny little changes manage to hit back to the meta level, like sex or science, and then the history of optimization enters a new epoch and everything proceeds faster from there.

  Imagine an economy without investment, or a university without language, a technology without tools to make tools. Once in a hundred million years, or once in a few centuries, someone invents a hammer.

  That is what optimization has been like on Earth up until now.

  When I look at the history of Earth, I don’t see a history of optimization over time. I see a history of optimization power in, and optimized products out. Up until now, thanks to the existence of almost entirely protected meta-levels, it’s been possible to split up the history of optimization into epochs, and, within each epoch, graph the cumulative object-level optimization over time, because the protected level is running in the background and is not itself changing within an epoch.

  What happens when you build a fully wraparound, recursively self-improving AI? Then you take the graph of “optimization in, optimized out,” and fold the graph in on itself. Metaphorically speaking.

  If the AI is weak, it does nothing, because it is not powerful enough to significantly improve itself—like telling a chimpanzee to rewrite its own brain.

  If the AI is powerful enough to rewrite itself in a way that increases its ability to make further improvements, and this reaches all the way down to the AI’s full understanding of its own source code and its own design as an optimizer . . . then even if the graph of “optimization power in” and “optimized product out” looks essentially the same, the graph of optimization over time is going to look completely different from Earth’s history so far.

  People often say something like, “But what if it requires exponentially greater amounts of self-rewriting for only a linear improvement?” To this the obvious answer is, “Natural selection exerted roughly constant optimization power on the hominid line in the course of coughing up humans; and this doesn’t seem to have required exponentially more time for each linear increment of improvement.”

  All of this is still mere analogic reasoning. A full Artificial General Intelligence thinking about the nature of optimization and doing its own AI research and rewriting its own source code, is not really like a graph of Earth’s history folded in on itself. It is a different sort of beast. These analogies are at best good for qualitative predictions, and even then, I have a large amount of other beliefs I haven’t yet explained, which are telling me which analogies to make, et cetera.

  But if you want to know why I might be reluctant to extend the graph of biological and economic growth over time, into the future and over the horizon of an AI that thinks at transistor speeds and invents self-replicating molecular nanofactories and improves its own source code, then there is my reason: you are drawing the wrong graph, and it should be optimization power in versus optimized product out, not optimized product versus time.

  *

  146

  Ghosts in the Machine

  People hear about Friendly AI and say—this is one of the top three initial reactions:

  “Oh, you can try to tell the AI to be Friendly, but if the AI can modify its own source code, it’ll just remove any constraints you try to place on it.”

  And where does that decision come from?

  Does it enter from outside causality, rather than being an effect of a lawful chain of causes that started with the source code as originally written? Is the AI the ultimate source of its own free will?

  A Friendly AI is not a selfish AI constrained by a special extra conscience module that overrides the AI’s natural impulses and tells it what to do. You just build the conscience, and that is the AI. If you have a program that computes which decision the AI should make, you’re done. The buck stops immediately.

  At this point, I shall take a moment to quote some case studies from the Computer Stupidities site and Programming subtopic. (I am not linking to this, because it is a fearsome time-trap; you can Google if you dare.)

  I tutored college students who were taking a computer programming course. A few of them didn’t understand that computers are not sentient. More than one person used comments in their Pascal programs to put detailed explanations such as, “Now I need you to put these letters on the screen.” I asked one of them what the deal was with those comments. The reply: “How else is the computer going to understand what I want it to do?” Apparently they would assume that since they couldn’t make sense of Pascal, neither could the computer.

  * * *

  While in college, I used to tutor in the school’s math lab. A student came in because his BASIC program would not run. He was taking a beginner course, and his assignment was to write a program that would calculate the recipe for oatmeal cookies, depending upon the number of people you’re baking for. I looked at his program, and it went something like this:

  10 Preheat oven to 350

  20 Combine all ingredients in a large mixing bowl

  30 Mix until smooth

  * * *

  An introductory programming student once asked me to look at his program and figure out why it was always churning out zeroes as the result of a simple computation. I looked at the program, and it was pretty obvious:

  begin

  read("Number of Apples", apples)

  read("Number of Carrots", carrots)

  read("Price for 1 Apple", a_price)

  read("Price for 1 Carrot", c_price)

  write("Total for Apples", a_total)

  write("Total for Carrots", c_total)

  write("Total", total)

  total = a_total + c_total

  a_total = apples * a_price

  c_total = carrots * c_price

  end

  Me: “Well, your program can’t print correct results before they’re computed.”

  Him: “Huh? It’s logical what the right solution is, and the computer should reorder the instructions the right way.”

  There’s an instinctive way of imagining the scenario of “programming an AI.” It maps onto a similar-seeming human endeavor: Telling a human being what to do. Like the “program” is giving instructions to a little ghost that sits inside the machine, which will look over your instructions and decide whether it likes them or not.

  There is no ghost who looks over the instructions and decides how to follow them. The program i
s the AI.

  That doesn’t mean the ghost does anything you wish for, like a genie. It doesn’t mean the ghost does everything you want the way you want it, like a slave of exceeding docility. It means your instruction is the only ghost that’s there, at least at boot time.

  AI is much harder than people instinctively imagined, exactly because you can’t just tell the ghost what to do. You have to build the ghost from scratch, and everything that seems obvious to you, the ghost will not see unless you know how to make the ghost see it. You can’t just tell the ghost to see it. You have to create that-which-sees from scratch.

  If you don’t know how to build something that seems to have some strange ineffable elements like, say, “decision-making,” then you can’t just shrug your shoulders and let the ghost’s free will do the job. You’re left forlorn and ghostless.

  There’s more to building a chess-playing program than building a really fast processor—so the AI will be really smart—and then typing at the command prompt “Make whatever chess moves you think are best.” You might think that, since the programmers themselves are not very good chess players, any advice they tried to give the electronic superbrain would just slow the ghost down. But there is no ghost. You see the problem.

  And there isn’t a simple spell you can perform to—poof!—summon a complete ghost into the machine. You can’t say, “I summoned the ghost, and it appeared; that’s cause and effect for you.” (It doesn’t work if you use the notion of “emergence” or “complexity” as a substitute for “summon,” either.) You can’t give an instruction to the CPU, “Be a good chess player!” You have to see inside the mystery of chess-playing thoughts, and structure the whole ghost from scratch.

  No matter how common-sensical, no matter how logical, no matter how “obvious” or “right” or “self-evident” or “intelligent” something seems to you, it will not happen inside the ghost. Unless it happens at the end of a chain of cause and effect that began with the instructions that you had to decide on, plus any causal dependencies on sensory data that you built into the starting instructions.

 

‹ Prev