Book Read Free

Rationality- From AI to Zombies

Page 144

by Eliezer Yudkowsky


  But this essay was written for those who have something to protect.

  What can a twelfth-century peasant do to save themselves from annihilation? Nothing. Nature’s little challenges aren’t always fair. When you run into a challenge that’s too difficult, you suffer the penalty; when you run into a lethal penalty, you die. That’s how it is for people, and it isn’t any different for planets. Someone who wants to dance the deadly dance with Nature does need to understand what they’re up against: Absolute, utter, exceptionless neutrality.

  Knowing this won’t always save you. It wouldn’t save a twelfth-century peasant, even if they knew. If you think that a rationalist who fully understands the mess they’re in must surely be able to find a way out—then you trust rationality, enough said.

  Some commenter is bound to castigate me for putting too dark a tone on all this, and in response they will list out all the reasons why it’s lovely to live in a neutral universe. Life is allowed to be a little dark, after all; but not darker than a certain point, unless there’s a silver lining.

  Still, because I don’t want to create needless despair, I will say a few hopeful words at this point:

  If humanity’s future unfolds in the right way, we might be able to make our future light cone fair(er). We can’t modify fundamental physics, but on a higher level of organization we could build some guardrails and put down some padding; organize the particles into a pattern that does some internal checks against catastrophe. There’s a lot of stuff out there that we can’t touch—but it may help to consider everything that isn’t in our future light cone as being part of the “generalized past.” As if it had all already happened. There’s at least the prospect of defeating neutrality, in the only future we can touch—the only world that it accomplishes something to care about.

  Someday, maybe, immature minds will reliably be sheltered. Even if children go through the equivalent of not getting a lollipop, or even burning a finger, they won’t ever be run over by cars.

  And the adults wouldn’t be in so much danger. A superintelligence—a mind that could think a trillion thoughts without a misstep—would not be intimidated by a challenge where death is the price of a single failure. The raw universe wouldn’t seem so harsh, would be only another problem to be solved.

  The problem is that building an adult is itself an adult challenge. That’s what I finally realized, years ago.

  If there is a fair(er) universe, we have to get there starting from this world—the neutral world, the world of hard concrete with no padding, the world where challenges are not calibrated to your skills.

  Not every child needs to stare Nature in the eyes. Buckling a seat belt, or writing a check, is not that complicated or deadly. I don’t say that every rationalist should meditate on neutrality. I don’t say that every rationalist should think all these unpleasant thoughts. But anyone who plans on confronting an uncalibrated challenge of instant death must not avoid them.

  What does a child need to do—what rules should they follow, how should they behave—to solve an adult problem?

  *

  303

  My Bayesian Enlightenment

  I remember (dimly, as human memories go) the first time I self-identified as a “Bayesian.” Someone had just asked a malformed version of an old probability puzzle, saying:

  If I meet a mathematician on the street, and she says, “I have two children, and at least one of them is a boy,” what is the probability that they are both boys?

  In the correct version of this story, the mathematician says, “I have two children,” and you ask, “Is at least one a boy?,” and she answers, “Yes.” Then the probability is 1/3 that they are both boys.

  But in the malformed version of the story—as I pointed out—one would common-sensically reason:

  If the mathematician has one boy and one girl, then my prior probability for her saying “at least one of them is a boy” is 1/2 and my prior probability for her saying “at least one of them is a girl” is 1/2. There’s no reason to believe, a priori, that the mathematician will only mention a girl if there is no possible alternative.

  So I pointed this out, and worked the answer using Bayes’s Rule, arriving at a probability of 1/2 that the children were both boys. I’m not sure whether or not I knew, at this point, that Bayes’s rule was called that, but it’s what I used.

  And lo, someone said to me, “Well, what you just gave is the Bayesian answer, but in orthodox statistics the answer is 1/3. We just exclude the possibilities that are ruled out, and count the ones that are left, without trying to guess the probability that the mathematician will say this or that, since we have no way of really knowing that probability—it’s too subjective.”

  I responded—note that this was completely spontaneous—“What on Earth do you mean? You can’t avoid assigning a probability to the mathematician making one statement or another. You’re just assuming the probability is 1, and that’s unjustified.”

  To which the one replied, “Yes, that’s what the Bayesians say. But frequentists don’t believe that.”

  And I said, astounded: “How can there possibly be such a thing as non-Bayesian statistics?”

  That was when I discovered that I was of the type called “Bayesian.” As far as I can tell, I was born that way. My mathematical intuitions were such that everything Bayesians said seemed perfectly straightforward and simple, the obvious way I would do it myself; whereas the things frequentists said sounded like the elaborate, warped, mad blasphemy of dreaming Cthulhu. I didn’t choose to become a Bayesian any more than fishes choose to breathe water.

  But this is not what I refer to as my “Bayesian enlightenment.” The first time I heard of “Bayesianism,” I marked it off as obvious; I didn’t go much further in than Bayes’s Rule itself. At that time I still thought of probability theory as a tool rather than a law. I didn’t think there were mathematical laws of intelligence (my best and worst mistake). Like nearly all AGI wannabes, Eliezer2001 thought in terms of techniques, methods, algorithms, building up a toolbox full of cool things he could do; he searched for tools, not understanding. Bayes’s Rule was a really neat tool, applicable in a surprising number of cases.

  Then there was my initiation into heuristics and biases. It started when I ran across a webpage that had been transduced from a Powerpoint intro to behavioral economics. It mentioned some of the results of heuristics and biases, in passing, without any references. I was so startled that I emailed the author to ask if this was actually a real experiment, or just anecdotal. He sent me back a scan of Tversky and Kahneman’s 1973 paper.

  Embarrassing to say, my story doesn’t really start there. I put it on my list of things to look into. I knew that there was an edited volume called “Judgment Under Uncertainty: Heuristics and Biases,” but I’d never seen it. At this time, I figured that if it wasn’t online, I would just try to get along without it. I had so many other things on my reading stack, and no easy access to a university library. I think I must have mentioned this on a mailing list, because Emil Gilliam was annoyed by my online-only theory, so he bought me the book.

  His action here should probably be regarded as scoring a fair number of points.

  But this, too, is not what I refer to as my “Bayesian enlightenment.” It was an important step toward realizing the inadequacy of my Traditional Rationality skillz—that there was so much more out there, all this new science, beyond just doing what Richard Feynman told you to do. And seeing the heuristics-and-biases program holding up Bayes as the gold standard helped move my thinking forward—but not all the way there.

  Memory is a fragile thing, and mine seems to have become more fragile than most, since I learned how memories are recreated with each recollection—the science of how fragile they are. Do other people really have better memories, or do they just trust the details their mind makes up, while really not remembering any more than I do? My guess is that other people do have better memories for certain things. I find structured, scien
tific knowledge easy enough to remember; but the disconnected chaos of everyday life fades very quickly for me.

  I know why certain things happened in my life—that’s causal structure I can remember. But sometimes it’s hard to recall even in what order certain events happened to me, let alone in what year.

  I’m not sure if I read E. T. Jaynes’s Probability Theory: The Logic of Science before or after the day when I realized the magnitude of my own folly, and understood that I was facing an adult problem.

  But it was Probability Theory that did the trick. Here was probability theory, laid out not as a clever tool, but as The Rules, inviolable on pain of paradox. If you tried to approximate The Rules because they were too computationally expensive to use directly, then, no matter how necessary that compromise might be, you would still end up doing less than optimal. Jaynes would do his calculations different ways to show that the same answer always arose when you used legitimate methods; and he would display different answers that others had arrived at, and trace down the illegitimate step. Paradoxes could not coexist with his precision. Not an answer, but the answer.

  And so—having looked back on my mistakes, and all the an-answers that had led me into paradox and dismay—it occurred to me that here was the level above mine.

  I could no longer visualize trying to build an AI based on vague answers—like the an-answers I had come up with before—and surviving the challenge.

  I looked at the AGI wannabes with whom I had tried to argue Friendly AI, and the various dreams of Friendliness that they had. (Often formulated spontaneously in response to my asking the question!) Like frequentist statistical methods, no two of them agreed with each other. Having actually studied the issue full-time for some years, I knew something about the problems their hopeful plans would run into. And I saw that if you said, “I don’t see why this would fail,” the “don’t know” was just a reflection of your own ignorance. I could see that if I held myself to a similar standard of “that seems like a good idea,” I would also be doomed. (Much like a frequentist inventing amazing new statistical calculations that seemed like good ideas.)

  But if you can’t do that which seems like a good idea—if you can’t do what you don’t imagine failing—then what can you do?

  It seemed to me that it would take something like the Jaynes-level—not, here’s my bright idea, but rather, here’s the only correct way you can do this (and why)—to tackle an adult problem and survive. If I achieved the same level of mastery of my own subject as Jaynes had achieved of probability theory, then it was at least imaginable that I could try to build a Friendly AI and survive the experience.

  Through my mind flashed the passage:

  Do nothing because it is righteous, or praiseworthy, or noble, to do so; do nothing because it seems good to do so; do only that which you must do, and which you cannot do in any other way.1

  Doing what it seemed good to do had only led me astray.

  So I called a full stop.

  And I decided that, from then on, I would follow the strategy that could have saved me if I had followed it years ago: Hold my FAI designs to the higher standard of not doing that which seemed like a good idea, but only that which I understood on a sufficiently deep level to see that I could not do it in any other way.

  All my old theories, into which I had invested so much, did not meet this standard; and were not close to this standard; and weren’t even on a track leading to this standard; so I threw them out the window.

  I took up the study of probability theory and decision theory, looking to extend them to embrace such things as reflectivity and self-modification.

  If I recall correctly, I had already, by this point, started to see cognition as manifesting Bayes-structure, which is also a major part of what I refer to as my Bayesian enlightenment—but of this I have already spoken. And there was also my naturalistic awakening, of which I have already spoken. And my realization that Traditional Rationality was not strict enough, so that in matters of human rationality I began taking more inspiration from probability theory and cognitive psychology.

  But if you add up all these things together, then that, more or less, is the story of my Bayesian enlightenment.

  Life rarely has neat boundaries. The story continues onward.

  It was while studying Judea Pearl, for example, that I realized that precision can save you time. I’d put some thought into nonmonotonic logics myself, before then—back when I was still in my “searching for neat tools and algorithms” mode. Reading Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,2 I could imagine how much time I would have wasted on ad-hoc systems and special cases, if I hadn’t known that key. “Do only that which you must do, and which you cannot do in any other way” translates into a time-savings measured, not in the rescue of wasted months, but in the rescue of wasted careers.

  And so I realized that it was only by holding myself to this higher standard of precision that I had started to really think at all about quite a number of important issues. To say a thing with precision is difficult—it is not at all the same thing as saying a thing formally, or inventing a new logic to throw at the problem. Many shy away from the inconvenience, because human beings are lazy, and so they say, “It is impossible,” or, “It will take too long,” even though they never really tried for five minutes. But if you don’t hold yourself to that inconveniently high standard, you’ll let yourself get away with anything. It’s a hard problem just to find a standard high enough to make you actually start thinking! It may seem taxing to hold yourself to the standard of mathematical proof where every single step has to be correct and one wrong step can carry you anywhere. But otherwise you won’t chase down those tiny notes of discord that turn out to, in fact, lead to whole new concerns you never thought of.

  So these days I don’t complain as much about the heroic burden of inconvenience that it takes to hold yourself to a precise standard. It can save time, too; and in fact, it’s more or less the ante to get yourself thinking about the problem at all.

  And this too should be considered part of my “Bayesian enlightenment”—realizing that there were advantages in it, not just penalties.

  But of course the story continues on. Life is like that, at least the parts that I remember.

  If there’s one thing I’ve learned from this history, it’s that saying “Oops” is something to look forward to. Sure, the prospect of saying “Oops” in the future means that the you of right now is a drooling imbecile, whose words your future self won’t be able to read because of all the wincing. But saying “Oops” in the future also means that, in the future, you’ll acquire new Jedi powers that your present self doesn’t dream exist. It makes you feel embarrassed, but also alive. Realizing that your younger self was a complete moron means that even though you’re already in your twenties, you haven’t yet gone over your peak. So here’s to hoping that my future self realizes I’m a drooling imbecile: I may plan to solve my problems with my present abilities, but extra Jedi powers sure would come in handy.

  That scream of horror and embarrassment is the sound that rationalists make when they level up. Sometimes I worry that I’m not leveling up as fast as I used to, and I don’t know if it’s because I’m finally getting the hang of things, or because the neurons in my brain are slowly dying.

  Yours, Eliezer2008.

  *

  1. Le Guin, The Farthest Shore.

  2. Pearl, Probabilistic Reasoning in Intelligent Systems.

  Part Y

  Challenging the Difficult

  304

  Tsuyoku Naritai! (I Want to Become Stronger)

  In Orthodox Judaism there is a saying: “The previous generation is to the next one as angels are to men; the next generation is to the previous one as donkeys are to men.” This follows from the Orthodox Jewish belief that all Judaic law was given to Moses by God at Mount Sinai. After all, it’s not as if you could do an experiment to gain new halachic knowledge; the only way
you can know is if someone tells you (who heard it from someone else, who heard it from God). Since there is no new source of information, it can only be degraded in transmission from generation to generation.

  Thus, modern rabbis are not allowed to overrule ancient rabbis. Crawly things are ordinarily unkosher, but it is permissible to eat a worm found in an apple—the ancient rabbis believed the worm was spontaneously generated inside the apple, and therefore was part of the apple. A modern rabbi cannot say, “Yeah, well, the ancient rabbis knew diddly-squat about biology. Overruled!” A modern rabbi cannot possibly know a halachic principle the ancient rabbis did not, because how could the ancient rabbis have passed down the answer from Mount Sinai to him? Knowledge derives from authority, and therefore is only ever lost, not gained, as time passes.

  When I was first exposed to the angels-and-donkeys proverb in (religious) elementary school, I was not old enough to be a full-blown atheist, but I still thought to myself: “Torah loses knowledge in every generation. Science gains knowledge with every generation. No matter where they started out, sooner or later science must surpass Torah.”

  The most important thing is that there should be progress. So long as you keep moving forward you will reach your destination; but if you stop moving you will never reach it.

  Tsuyoku naritai is Japanese. Tsuyoku is “strong”; naru is “becoming,” and the form naritai is “want to become.” Together it means “I want to become stronger,” and it expresses a sentiment embodied more intensely in Japanese works than in any Western literature I’ve read. You might say it when expressing your determination to become a professional Go player—or after you lose an important match, but you haven’t given up—or after you win an important match, but you’re not a ninth-dan player yet—or after you’ve become the greatest Go player of all time, but you still think you can do better. That is tsuyoku naritai, the will to transcendence.

 

‹ Prev