Rationality- From AI to Zombies

Page 155

by Eliezer Yudkowsky

Et cetera. I don’t have a brilliant solution to this problem. But it’s the sort of thing that I would wish for potential dot-com cofounders to ponder explicitly, rather than wondering how to throw sheep on Facebook. (Yes, I’m looking at you, Hacker News.) There are online activism web apps, but they tend to be along the lines of sign this petition! yay, you signed something! rather than how can we counteract the bystander effect, restore motivation, and work with native group-coordination instincts, over the Internet?

Some of the things that come to mind:

Put a video of someone asking for help online.

Put up names and photos or even brief videos if available of the first people who helped (or have some reddit-ish priority algorithm that depends on a combination of amount-helped and recency).

Give helpers a video thank-you from the founder of the cause that they can put up on their “people I’ve helped” page, which with enough standardization could be partially or wholly assembled automatically and easily embedded in their home webpage or Facebook account.

Find a non-annoying idiom for “Tell a friend about cause X”; allow referrer link codes; then show people how many others they’ve evangelized (how many people who initially got here using referrer code X actually contributed or took some other action).

(All of the above applies not just to donations, but to open-source projects to which people have contributed code. Or if people really do want nothing but signatures on a petition, then for signatures. There are ways to help besides money—even though money is usually the most effective. The main thing is that the form of help has to be verifiable online.)

Make it easier for people to offer monetary bounties on subtasks whose performance is verifiable.

But mostly I just hand you an open, unsolved problem: make it possible/easier for groups of strangers to coalesce into an effective task force over the Internet, in defiance of the usual failure modes and the default reasons why this is a non-ancestral problem. Think of that old statistic about Wikipedia representing 1/2,000 of the time spent in the US alone on watching television. There’s quite a lot of fuel out there, if there were only such a thing as an effective engine . . .

*

328

Incremental Progress and the Valley

Rationality is systematized winning.

“But,” you protest, “the reasonable person doesn’t always win!”

What do you mean by this? Do you mean that every week or two, someone who bought a lottery ticket with negative expected value wins the lottery and becomes much richer than you? That is not a systematic loss; it is selective reporting by the media. From a statistical standpoint, lottery winners don’t exist—you would never encounter one in your lifetime, if it weren’t for the selective reporting.

Even perfectly rational agents can lose. They just can’t know in advance that they’ll lose. They can’t expect to underperform any other performable strategy, or they would simply perform it.

“No,” you say, “I’m talking about how startup founders strike it rich by believing in themselves and their ideas more strongly than any reasonable person would. I’m talking about how religious people are happier—”

Ah. Well, here’s the thing: An incremental step in the direction of rationality, if the result is still irrational in other ways, does not have to yield incrementally more winning.

The optimality theorems that we have for probability theory and decision theory are for perfect probability theory and decision theory. There is no companion theorem which says that, starting from some flawed initial form, every incremental modification of the algorithm that takes the structure closer to the ideal must yield an incremental improvement in performance. This has not yet been proven, because it is not, in fact, true.

“So,” you say, “what point is there then in striving to be more rational? We won’t reach the perfect ideal. So we have no guarantee that our steps forward are helping.”

You have no guarantee that a step backward will help you win, either. Guarantees don’t exist in the world of flesh; but, contrary to popular misconceptions, judgment under uncertainty is what rationality is all about.

“But we have several cases where, based on either vaguely plausible-sounding reasoning, or survey data, it looks like an incremental step forward in rationality is going to make us worse off. If it’s really all about winning—if you have something to protect more important than any ritual of cognition—then why take that step?”

Ah, and now we come to the meat of it.

I can’t necessarily answer for everyone, but . . .

My first reason is that, on a professional basis, I deal with deeply confused problems that make huge demands on precision of thought. One small mistake can lead you astray for years, and there are worse penalties waiting in the wings. An unimproved level of performance isn’t enough; my choice is to try to do better, or give up and go home.

“But that’s just you. Not all of us lead that kind of life. What if you’re just trying some ordinary human task like an Internet startup?”

My second reason is that I am trying to push some aspects of my art further than I have seen done. I don’t know where these improvements lead. The loss of failing to take a step forward is not that one step. It is all the other steps forward you could have taken, beyond that point. Robin Hanson has a saying: The problem with slipping on the stairs is not falling the height of the first step; it is that falling one step leads to falling another step. In the same way, refusing to climb one step up forfeits not the height of that step but the height of the staircase.

“But again—that’s just you. Not all of us are trying to push the art into uncharted territory.”

My third reason is that once I realize I have been deceived, I can’t just shut my eyes and pretend I haven’t seen it. I have already taken that step forward; what use to deny it to myself? I couldn’t believe in God if I tried, any more than I could believe the sky above me was green while looking straight at it. If you know everything you need to know in order to know that you are better off deceiving yourself, it’s much too late to deceive yourself.

“But that realization is unusual; other people have an easier time of doublethink because they don’t realize it’s impossible. You go around trying to actively sponsor the collapse of doublethink. You, from a higher vantage point, may know enough to expect that this will make them unhappier. So is this out of a sadistic desire to hurt your readers, or what?”

Then I finally reply that my experience so far—even in this realm of merely human possibility—does seem to indicate that, once you sort yourself out a bit and you aren’t doing quite so many other things wrong, striving for more rationality actually will make you better off. The long road leads out of the valley and higher than before, even in the human lands.

The more I know about some particular facet of the Art, the more I can see this is so. As I’ve previously remarked, my essays may be unreflective of what a true martial art of rationality would be like, because I have only focused on answering confusing questions—not fighting akrasia, coordinating groups, or being happy. In the field of answering confusing questions—the area where I have most intensely practiced the Art—it now seems massively obvious that anyone who thought they were better off “staying optimistic about solving the problem” would get stomped into the ground. By a casual student.

When it comes to keeping motivated, or being happy, I can’t guarantee that someone who loses their illusions will be better off—because my knowledge of these facets of rationality is still crude. If these parts of the Art have been developed systematically, I do not know of it. But even here I have gone to some considerable pains to dispel half-rational half-mistaken ideas that could get in a beginner’s way, like the idea that rationality opposes feeling, or the idea that rationality opposes value, or the idea that sophisticated thinkers should be angsty and cynical.

And if, as I hope, someone goes on to develop the art of fighting akrasia
or achieving mental well-being as thoroughly as I have developed the art of answering impossible questions, I do fully expect that those who wrap themselves in their illusions will not begin to compete. Meanwhile—others may do better than I, if happiness is their dearest desire, for I myself have invested little effort here.

I find it hard to believe that the optimally motivated individual, the strongest entrepreneur a human being can become, is still wrapped up in a blanket of comforting overconfidence. I think they’ve probably thrown that blanket out the window and organized their mind a little differently. I find it hard to believe that the happiest we can possibly live, even in the realms of human possibility, involves a tiny awareness lurking in the corner of your mind that it’s all a lie. I’d rather stake my hopes on neurofeedback or Zen meditation, though I’ve tried neither.

But it cannot be denied that this is a very real issue in very real life. Consider this pair of comments from Less Wrong:

I’ll be honest—my life has taken a sharp downturn since I deconverted. My theist girlfriend, with whom I was very much in love, couldn’t deal with this change in me, and after six months of painful vacillation, she left me for a co-worker. That was another six months ago, and I have been heartbroken, miserable, unfocused, and extremely ineffective since.

Perhaps this is an example of the valley of bad rationality of which PhilGoetz spoke, but I still hold my current situation higher in my preference ranking than happiness with false beliefs.

And:

My empathies: that happened to me about 6 years ago (though thankfully without as much visible vacillation).

My sister, who had some Cognitive Behaviour Therapy training, reminded me that relationships are forming and breaking all the time, and given I wasn’t unattractive and hadn’t retreated into monastic seclusion, it wasn’t rational to think I’d be alone for the rest of my life (she turned out to be right). That was helpful at the times when my feelings hadn’t completely got the better of me.

So—in practice, in real life, in sober fact—those first steps can, in fact, be painful. And then things can, in fact, get better. And there is, in fact, no guarantee that you’ll end up higher than before. Even if in principle the path must go further, there is no guarantee that any given person will get that far.

If you don’t prefer truth to happiness with false beliefs . . .

Well . . . and if you are not doing anything especially precarious or confusing . . . and if you are not buying lottery tickets . . . and if you’re already signed up for cryonics, a sudden ultra-high-stakes confusing acid test of rationality that illustrates the Black Swan quality of trying to bet on ignorance in ignorance . . .

Then it’s not guaranteed that taking all the incremental steps toward rationality that you can find will leave you better off. But the vaguely plausible-sounding arguments against losing your illusions generally do consider just one single step, without postulating any further steps, without suggesting any attempt to regain everything that was lost and go it one better. Even the surveys are comparing the average religious person to the average atheist, not the most advanced theologians to the most advanced rationalists.

But if you don’t care about the truth—and you have nothing to protect—and you’re not attracted to the thought of pushing your art as far as it can go—and your current life seems to be going fine—and you have a sense that your mental well-being depends on illusions you’d rather not think about—

Then you’re probably not reading this. But if you are, then, I guess . . . well . . . (a) sign up for cryonics, and then (b) stop reading Less Wrong before your illusions collapse! RUN AWAY!

*

329

Bayesians vs. Barbarians

Previously:

Let’s say we have two groups of soldiers. In group 1, the privates are ignorant of tactics and strategy; only the sergeants know anything about tactics and only the officers know anything about strategy. In group 2, everyone at all levels knows all about tactics and strategy.

Should we expect group 1 to defeat group 2, because group 1 will follow orders, while everyone in group 2 comes up with better ideas than whatever orders they were given?

In this case I have to question how much group 2 really understands about military theory, because it is an elementary proposition that an uncoordinated mob gets slaughtered.

Suppose that a country of rationalists is attacked by a country of Evil Barbarians who know nothing of probability theory or decision theory.

Now there’s a certain viewpoint on “rationality” or “rationalism” which would say something like this:

“Obviously, the rationalists will lose. The Barbarians believe in an afterlife where they’ll be rewarded for courage; so they’ll throw themselves into battle without hesitation or remorse. Thanks to their affective death spirals around their Cause and Great Leader Bob, their warriors will obey orders, and their citizens at home will produce enthusiastically and at full capacity for the war; anyone caught skimming or holding back will be burned at the stake in accordance with Barbarian tradition. They’ll believe in each other’s goodness and hate the enemy more strongly than any sane person would, binding themselves into a tight group. Meanwhile, the rationalists will realize that there’s no conceivable reward to be had from dying in battle; they’ll wish that others would fight, but not want to fight themselves. Even if they can find soldiers, their civilians won’t be as cooperative: So long as any one sausage almost certainly doesn’t lead to the collapse of the war effort, they’ll want to keep that sausage for themselves, and so not contribute as much as they could. No matter how refined, elegant, civilized, productive, and nonviolent their culture was to start with, they won’t be able to resist the Barbarian invasion; sane discussion is no match for a frothing lunatic armed with a gun. In the end, the Barbarians will win because they want to fight, they want to hurt the rationalists, they want to conquer and their whole society is united around conquest; they care about that more than any sane person would.”

War is not fun. As many, many people have found since the dawn of recorded history, as many, many people have found before the dawn of recorded history, as some community somewhere is finding out right now in some sad little country whose internal agonies don’t even make the front pages any more.

War is not fun. Losing a war is even less fun. And it was said since the ancient times: “If thou would have peace, prepare for war.” Your opponents don’t have to believe that you’ll win, that you’ll conquer; but they have to believe you’ll put up enough of a fight to make it not worth their while.

You perceive, then, that if it were genuinely the lot of “rationalists” to always lose in war, that I could not in good conscience advocate the widespread public adoption of “rationality.”

This is probably the dirtiest topic I’ve discussed or plan to discuss here. War is not clean. Current high-tech militaries—by this I mean the US military—are unique in the overwhelmingly superior force they can bring to bear on opponents, which allows for a historically extraordinary degree of concern about enemy casualties and civilian casualties.

Winning in war has not always meant tossing aside all morality. Wars have been won without using torture. The unfunness of war does not imply, say, that questioning the President is unpatriotic. We’re used to “war” being exploited as an excuse for bad behavior, because in recent US history that pretty much is exactly what it’s been used for . . .

But reversed stupidity is not intelligence. And reversed evil is not intelligence either. It remains true that real wars cannot be won by refined politeness. If “rationalists” can’t prepare themselves for that mental shock, the Barbarians really will win; and the “rationalists” . . . I don’t want to say, “deserve to lose.” But they will have failed that test of their society’s existence.

Let me start by disposing of the idea that, in principle, ideal rational agents cannot fight a war, because each of them prefers being a civilian to being a soldier.
/>
As has already been discussed at some length, I one-box on Newcomb’s Problem.

Consistently, I do not believe that if an election is settled by 100,000 to 99,998 votes, that all of the voters were irrational in expending effort to go to the polling place because “my staying home would not have affected the outcome.” (Nor do I believe that if the election came out 100,000 to 99,999, then 100,000 people were all, individually, solely responsible for the outcome.)

Consistently, I also hold that two rational AIs (that use my kind of decision theory), even if they had completely different utility functions and were designed by different creators, will cooperate on the true Prisoner’s Dilemma if they have common knowledge of each other’s source code. (Or even just common knowledge of each other’s rationality in the appropriate sense.)

Consistently, I believe that rational agents are capable of coordinating on group projects whenever the (expected probabilistic) outcome is better than it would be without such coordination. A society of agents that use my kind of decision theory, and have common knowledge of this fact, will end up at Pareto optima instead of Nash equilibria. If all rational agents agree that they are better off fighting than surrendering, they will fight the Barbarians rather than surrender.

Imagine a community of self-modifying AIs who collectively prefer fighting to surrender, but individually prefer being a civilian to fighting. One solution is to run a lottery, unpredictable to any agent, to select warriors. Before the lottery is run, all the AIs change their code, in advance, so that if selected they will fight as a warrior in the most communally efficient possible way—even if it means calmly marching into their own death.

(A reflectively consistent decision theory works the same way, only without the self-modification.)

You reply: “But in the real, human world, agents are not perfectly rational, nor do they have common knowledge of each other’s source code. Cooperation in the Prisoner’s Dilemma requires certain conditions according to your decision theory (which these margins are too small to contain) and these conditions are not met in real life.”

‹ Prev Next ›