Rationality- From AI to Zombies

Home > Science > Rationality- From AI to Zombies > Page 48
Rationality- From AI to Zombies Page 48

by Eliezer Yudkowsky


  Beneath the plaques, two sets of tally marks scratched into the wall. Under the plus column, two marks. Under the minus column, five marks. Seven times he had entered this room; five times he had decided not to change his mind; twice he had exited something of a different person. There was no set ratio prescribed, or set range—that would have been a mockery indeed. But if there were no marks in the plus column after a while, you might as well admit that there was no point in having the room, since you didn’t have the ability it stood for. Either that, or you’d been born knowing the truth and right of everything.

  Jeffreyssai seated himself, not facing the plaques, but facing away from them, at the featureless white wall. It was better to have no visual distractions.

  In his mind, he rehearsed first the meta-mnemonic, and then the various sub-mnemonics referenced, for the seven major principles and sixty-two specific techniques that were most likely to prove needful in the Ritual Of Changing One’s Mind. To this, Jeffreyssai added another mnemonic, reminding himself of his own fourteen most embarrassing oversights.

  He did not take a deep breath. Regular breathing was best.

  And then he asked himself the question.

  *

  Book III

  The Machine in the Ghost

  Minds: An Introduction

  Interlude: The Power of Intelligence

  L. The Simple Math of Evolution

  131. An Alien God

  132. The Wonder of Evolution

  133. Evolutions Are Stupid (But Work Anyway)

  134. No Evolutions for Corporations or Nanodevices

  135. Evolving to Extinction

  136. The Tragedy of Group Selectionism

  137. Fake Optimization Criteria

  138. Adaptation-Executers, Not Fitness-Maximizers

  139. Evolutionary Psychology

  140. An Especially Elegant Evolutionary Psychology Experiment

  141. Superstimuli and the Collapse of Western Civilization

  142. Thou Art Godshatter

  M. Fragile Purposes

  143. Belief in Intelligence

  144. Humans in Funny Suits

  145. Optimization and the Intelligence Explosion

  146. Ghosts in the Machine

  147. Artificial Addition

  148. Terminal Values and Instrumental Values

  149. Leaky Generalizations

  150. The Hidden Complexity of Wishes

  151. Anthropomorphic Optimism

  152. Lost Purposes

  N. A Human’s Guide to Words

  153. The Parable of the Dagger

  154. The Parable of Hemlock

  155. Words as Hidden Inferences

  156. Extensions and Intensions

  157. Similarity Clusters

  158. Typicality and Asymmetrical Similarity

  159. The Cluster Structure of Thingspace

  160. Disguised Queries

  161. Neural Categories

  162. How An Algorithm Feels From Inside

  163. Disputing Definitions

  164. Feel the Meaning

  165. The Argument from Common Usage

  166. Empty Labels

  167. Taboo Your Words

  168. Replace the Symbol with the Substance

  169. Fallacies of Compression

  170. Categorizing Has Consequences

  171. Sneaking in Connotations

  172. Arguing “By Definition”

  173. Where to Draw the Boundary?

  174. Entropy, and Short Codes

  175. Mutual Information, and Density in Thingspace

  176. Superexponential Conceptspace, and Simple Words

  177. Conditional Independence, and Naive Bayes

  178. Words as Mental Paintbrush Handles

  179. Variable Question Fallacies

  180. 37 Ways That Words Can Be Wrong

  Interlude: An Intuitive Explanation of Bayes’s Theorem

  Minds: An Introduction

  by Rob Bensinger

  You’re a mind, and that puts you in a pretty strange predicament.

  Very few things get to be minds. You’re that odd bit of stuff in the universe that can form predictions and make plans, weigh and revise beliefs, suffer, dream, notice ladybugs, or feel a sudden craving for mango. You can even form, inside your mind, a picture of your whole mind. You can reason about your own reasoning process, and work to bring its operations more in line with your goals.

  You’re a mind, implemented on a human brain. And it turns out that a human brain, for all its marvelous flexibility, is a lawful thing, a thing of pattern and routine. Your mind can follow a routine for a lifetime, without ever once noticing that it is doing so. And these routines can have great consequences.

  When a mental pattern serves you well, we call that “rationality.”

  You exist as you are, hard-wired to exhibit certain species of rationality and certain species of irrationality, because of your ancestry. You, and all life on Earth, are descended from ancient self-replicating molecules. This replication process was initially clumsy and haphazard, and soon yielded replicable differences between the replicators. “Evolution” is our name for the change in these differences over time.

  Since some of these reproducible differences impact reproducibility—a phenomenon called “selection”—evolution has resulted in organisms suited to reproduction in environments like the ones their ancestors had. Everything about you is built on the echoes of your ancestors’ struggles and victories.

  And so here you are: a mind, carved from weaker minds, seeking to understand your own inner workings, that they can be improved upon—improved upon relative to your goals, and not those of your designer, evolution. What useful policies and insights can we take away from knowing that this is our basic situation?

  Ghosts and Machines

  Our brains, in their small-scale structure and dynamics, look like many other mechanical systems. Yet we rarely think of our minds in the same terms we think of objects in our environments or organs in our bodies. Our basic mental categories—belief, decision, word, idea, feeling, and so on—bear little resemblance to our physical categories.

  Past philosophers have taken this observation and run with it, arguing that minds and brains are fundamentally distinct and separate phenomena. This is the view the philosopher Gilbert Ryle called “the dogma of the Ghost in the Machine.”1 But modern scientists and philosophers who have rejected dualism haven’t necessarily replaced it with a better predictive model of how the mind works. Practically speaking, our purposes and desires still function like free-floating ghosts, like a magisterium cut off from the rest of our scientific knowledge. We can talk about “rationality” and “bias” and “how to change our minds,” but if those ideas are still imprecise and unconstrained by any overarching theory, our scientific-sounding language won’t protect us from making the same kinds of mistakes as those whose theoretical posits include spirits and essences.

  Interestingly, the mystery and mystification surrounding minds doesn’t just obscure our view of humans. It also accrues to systems that seem mind-like or purposeful in evolutionary biology and artificial intelligence (AI). Perhaps, if we cannot readily glean what we are from looking at ourselves, we can learn more by using obviously inhuman processes as a mirror.

  There are many ghosts to learn from here—ghosts past, and present, and yet to come. And these illusions are real cognitive events, real phenomena that we can study and explain. If there appears to be a ghost in the machine, that appearance is itself the hidden work of a machine.

  The first sequence of The Machine in the Ghost, “The Simple Math of Evolution,” aims to communicate the dissonance and divergence between our hereditary history, our present-day biology, and our ultimate aspirations. This will require digging deeper than is common in introductions to evolution for non-biologists, which often restrict their attention to surface-level features of natural selection.

  The third sequence, “A Human’s Guide to Words,” di
scusses the basic relationship between cognition and concept formation. This is followed by a longer essay introducing Bayesian inference.

  Bridging the gap between these topics, “Fragile Purposes” abstracts from human cognition and evolution to the idea of minds and goal-directed systems at their most general. These essays serve the secondary purpose of explaining the author’s general approach to philosophy and the science of rationality, which is strongly informed by his work in AI.

  Rebuilding Intelligence

  Yudkowsky is a decision theorist and mathematician who works on foundational issues in Artificial General Intelligence (AGI), the theoretical study of domain-general problem-solving systems. Yudkowsky’s work in AI has been a major driving force behind his exploration of the psychology of human rationality, as he noted in his very first blog post on Overcoming Bias, The Martial Art of Rationality:

  Such understanding as I have of rationality, I acquired in the course of wrestling with the challenge of Artificial General Intelligence (an endeavor which, to actually succeed, would require sufficient mastery of rationality to build a complete working rationalist out of toothpicks and rubber bands). In most ways the AI problem is enormously more demanding than the personal art of rationality, but in some ways it is actually easier. In the martial art of mind, we need to acquire the real-time procedural skill of pulling the right levers at the right time on a large, pre-existing thinking machine whose innards are not end-user-modifiable. Some of the machinery is optimized for evolutionary selection pressures that run directly counter to our declared goals in using it. Deliberately we decide that we want to seek only the truth; but our brains have hardwired support for rationalizing falsehoods. [ . . . ]

  Trying to synthesize a personal art of rationality, using the science of rationality, may prove awkward: One imagines trying to invent a martial art using an abstract theory of physics, game theory, and human anatomy. But humans are not reflectively blind; we do have a native instinct for introspection. The inner eye is not sightless; but it sees blurrily, with systematic distortions. We need, then, to apply the science to our intuitions, to use the abstract knowledge to correct our mental movements and augment our metacognitive skills. We are not writing a computer program to make a string puppet execute martial arts forms; it is our own mental limbs that we must move. Therefore we must connect theory to practice. We must come to see what the science means, for ourselves, for our daily inner life.

  From Yudkowsky’s perspective, I gather, talking about human rationality without saying anything interesting about AI is about as difficult as talking about AI without saying anything interesting about rationality.

  In the long run, Yudkowsky predicts that AI will come to surpass humans in an “intelligence explosion,” a scenario in which self-modifying AI improves its own ability to productively redesign itself, kicking off a rapid succession of further self-improvements. The term “technological singularity” is sometimes used in place of “intelligence explosion;” until January 2013, MIRI was named “the Singularity Institute for Artificial Intelligence” and hosted an annual Singularity Summit. Since then, Yudkowsky has come to favor I.J. Good’s older term, “intelligence explosion,” to help distinguish his views from other futurist predictions, such as Ray Kurzweil’s exponential technological progress thesis.2

  Technologies like smarter-than-human AI seem likely to result in large societal upheavals, for the better or for the worse. Yudkowsky coined the term “Friendly AI theory” to refer to research into techniques for aligning an AGI’s preferences with the preferences of humans. At this point, very little is known about when generally intelligent software might be invented, or what safety approaches would work well in such cases. Present-day autonomous AI can already be quite challenging to verify and validate with much confidence, and many current techniques are not likely to generalize to more intelligent and adaptive systems. “Friendly AI” is therefore closer to a menagerie of basic mathematical and philosophical questions than to a well-specified set of programming objectives.

  As of 2015, Yudkowsky’s views on the future of AI continue to be debated by technology forecasters and AI researchers in industry and academia, who have yet to converge on a consensus position. Nick Bostrom’s book Superintelligence provides a big-picture summary of the many moral and strategic questions raised by smarter-than-human AI.3

  For a general introduction to the field of AI, the most widely used textbook is Russell and Norvig’s Artificial Intelligence: A Modern Approach.4 In a chapter discussing the moral and philosophical questions raised by AI, Russell and Norvig note the technical difficulty of specifying good behavior in strongly adaptive AI:

  [Yudkowsky] asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. We can’t just give a program a static utility function, because circumstances, and our desired responses to circumstances, change over time.

  Disturbed by the possibility that future progress in AI, nanotechnology, biotechnology, and other fields could endanger human civilization, Bostrom and Ćirković compiled the first academic anthology on the topic, Global Catastrophic Risks.5 The most extreme of these are the existential risks, risks that could result in the permanent stagnation or extinction of humanity.6

  People (experts included) tend to be extraordinarily bad at forecasting major future events (new technologies included). Part of Yudkowsky’s goal in discussing rationality is to figure out which biases are interfering with our ability to predict and prepare for big upheavals well in advance. Yudkowsky’s contributions to the Global Catastrophic Risks volume, “Cognitive biases potentially affecting judgement of global risks” and “Artificial intelligence as a positive and negative factor in global risk,” tie together his research in cognitive science and AI. Yudkowsky and Bostrom summarize near-term concerns along with long-term ones in a chapter of the Cambridge Handbook of Artificial Intelligence, “The ethics of artificial intelligence.”7

  Though this is a book about human rationality, the topic of AI has relevance as a source of simple illustrations of aspects of human cognition. Long-term technology forecasting is also one of the more important applications of Bayesian rationality, which can model correct reasoning even in domains where the data is scarce or equivocal.

  Knowing the design can tell you much about the designer; and knowing the designer can tell you much about the design.

  We’ll begin, then, by inquiring into what our own designer can teach us about ourselves.

  *

  1. Gilbert Ryle, The Concept of Mind (University of Chicago Press, 1949).

  2. Irving John Good, “Speculations Concerning the First Ultraintelligent Machine,” in Advances in Computers, ed. Franz L. Alt and Morris Rubinoff, vol. 6 (New York: Academic Press, 1965), 31–88, doi:10.1016/S0065-2458(08)60418-0.

  3. Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (Oxford University Press, 2014).

  4. Stuart J. Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd ed. (Upper Saddle River, NJ: Prentice-Hall, 2010).

  5. Bostrom and Ćirković, Global Catastrophic Risks.

  6. An example of a possible existential risk is the “grey goo” scenario, in which molecular robots designed to efficiently self-replicate do their job too well, rapidly outcompeting living organisms as they consume the Earth’s available matter.

  7. Nick Bostrom and Eliezer Yudkowsky, “The Ethics of Artificial Intelligence,” in The Cambridge Handbook of Artificial Intelligence, ed. Keith Frankish and William Ramsey (New York: Cambridge University Press, 2014).

  Interlude

  The Power of Intelligence

  In our skulls we
carry around three pounds of slimy, wet, grayish tissue, corrugated like crumpled toilet paper.

  You wouldn’t think, to look at the unappetizing lump, that it was some of the most powerful stuff in the known universe. If you’d never seen an anatomy textbook, and you saw a brain lying in the street, you’d say “Yuck!” and try not to get any of it on your shoes. Aristotle thought the brain was an organ that cooled the blood. It doesn’t look dangerous.

  Five million years ago, the ancestors of lions ruled the day, the ancestors of wolves roamed the night. The ruling predators were armed with teeth and claws—sharp, hard cutting edges, backed up by powerful muscles. Their prey, in self-defense, evolved armored shells, sharp horns, toxic venoms, camouflage. The war had gone on through hundreds of eons and countless arms races. Many a loser had been removed from the game, but there was no sign of a winner. Where one species had shells, another species would evolve to crack them; where one species became poisonous, another would evolve to tolerate the poison. Each species had its private niche—for who could live in the seas and the skies and the land at once? There was no ultimate weapon and no ultimate defense and no reason to believe any such thing was possible.

  Then came the Day of the Squishy Things.

  They had no armor. They had no claws. They had no venoms.

  If you saw a movie of a nuclear explosion going off, and you were told an Earthly life form had done it, you would never in your wildest dreams imagine that the Squishy Things could be responsible. After all, Squishy Things aren’t radioactive.

 

‹ Prev