by Martin Ford
Now, suddenly some people in the AI field are saying that AI is never going to succeed, and so there isn’t anything to worry about.
This is a completely pathological reaction if you ask me. It seems prudent, just as with nuclear energy and atomic weapons, to assume that human ingenuity will, in fact, overcome the obstacles and achieve intelligence of a kind that’s sufficient to present at least, potentially the threat of ceding control. It seems prudent to prepare for that and try to figure out how to design systems in such a way that that can’t happen. So that’s my goal: to help us prepare for the artificial intelligence threat.
MARTIN FORD: How should we address that threat?
STUART J. RUSSELL: The key to the problem is that we have made a slight mistake in the way that we define AI, and so I have a reconstructed a new definition for AI that goes as follows.
First of all, if we want to build artificial intelligence, we’d better figure out what it means to be intelligent. This means that we must draw from thousands of years of tradition, philosophy, economics and other disciplines. The idea of intelligence is that a human being is intelligent to the extent that their actions can be expected to achieve their objectives. This is the idea sometimes called rational behavior; and it contains within it various sub-kinds of intelligence, like the ability to reason; the ability to plan; the ability to perceive; and so on. Those are all kind of required capabilities for acting intelligently in the real world.
The problem with that is that if we succeed in creating artificial intelligence and machines with those abilities, then unless their objectives happen to be perfectly aligned with those of humans, then we’ve created something that’s extremely intelligent, but with objectives that are different from ours. And then, if that AI is more intelligent than us, then it’s going to attain its objectives—and we, probably, are not!
The negative consequences for humans are without limit. The mistake is in the way we have transferred the notion of intelligence, a concept that makes sense for humans, over to machines.
We don’t want machines with our type of intelligence. We actually want machines whose actions can be expected to achieve our objectives, not their objectives.
The original idea we had for AI was that to make an intelligent machine, we should construct optimizers: things that choose actions really well when we give them an objective. Then off it goes and achieves our objective. That’s probably a mistake. It’s worked up to now—but only because we haven’t made very intelligent machines, and the ones we have made we’ve only put in mini-worlds, like the simulated chessboard, the simulated Go board, and so on.
When the AI that humans have so far made, get out into the real-world, that’s when things can go wrong, and we saw an example of this with the flash crash. With the flash crash, there was a bunch of trading algorithms, some of them fairly simple, but some of them fairly complicated AI-based decision-making and learning systems. Out there in the real world, during the flash crash things went catastrophically wrong and those machines crashed the stock market. They eliminated more than a trillion dollars of value in equities in the space of a few minutes. The flash crash was a warning signal about our AI.
The right way to think about AI is that we should be making machines which act in ways to help us achieve our objectives through them, but where we absolutely do not put our objectives directly into the machine!
My vision is that AI must always be designed to try to help us achieve our objectives, but that AI systems should not be assumed to know what those objectives are.
If we make AI this way, then there is always an explicit uncertainty about the nature of the objectives that an AI is obliged to pursue. It turns out that this uncertainty actually is the margin of safety that we require.
I’ll give you an example to demonstrate this margin of safety that we really do need. Let’s go back to an old idea that we can—if we ever need to—just switch the machine off if we get into trouble. Well, of course, you know, if the machine has an objective like, “fetch the coffee,” then obviously a sufficiently intelligent machine realizes that if someone switches it off, then it’s not going to be able to fetch the coffee. If its life’s mission, if its objective, is to fetch the coffee, then logically it will take steps to prevent itself from being switched off. It will disable the Off switch. It will possibly neutralize anyone who might attempt to switch it off. So, you can imagine all these unanticipated consequences of a simple objective like “fetch the coffee,” when you have a sufficiently intelligent machine.
Now in my vision for AI, we instead design the machine so that although it still wants to “fetch the coffee” it understands that there are a lot of other things that human beings might care about, but it doesn’t really know what those are! In that situation, the AI understands that it might do something that the human doesn’t like—and if the human switches it off, that’s to prevent something that would make the human unhappy. Since in this vision the goal of the machine is to avoid making the human unhappy, even though the AI doesn’t know what that means, it actually has an incentive to allow itself to be switched off.
We can take this particular vision for AI and put it into mathematics, and show that the margin of safety (meaning, in this case, the incentive that the machine has to allow itself to be switched off) is directly related to the uncertainty it has about the human objective. As we eliminate that uncertainty, and the machine starts to believe that it knows, for sure, what the true objective really is, then that margin of safety begins to disappear again, and the machine will ultimately stop us from switching it off.
In this way, we can show that, at least in a simplified mathematical framework, that when you design machines this way—with explicit uncertainty about the objective that they are to pursue—then they can be provably beneficial, meaning that you are provably better off with this machine than without.
What I’ve shared here is an indication that there may be a way of conceiving of AI which is a little bit different from how we’ve been thinking about AI so far, that there are ways to build an AI system that has much better properties, in terms of safety and control.
MARTIN FORD: Related to these issues of AI safety and control, a lot of people worry about an arms race with other countries, especially China. Is that something we should take seriously, something we should be very concerned about?
STUART J. RUSSELL: Nick Bostrom and others have raised a concern that, if a party feels that strategic dominance in AI is a critical part of their national security and economic leadership, then that party will be driven to develop the capabilities of AI systems—as fast as possible, and yes, without worrying too much about the controllability issues.
At a high level, that sounds like a plausible argument. On the other hand, as we produce AI products that can operate out there in the real world, there will be a clear economic incentive to make sure that they remain under control.
To explore this kind of scenario, let’s think about a product that might come along fairly soon: a reasonably intelligent personal assistant that keeps track of your activities, conversations, relationships and so on, and kind of runs your life in the way that a good professional human assistant might help you. Now, if such a system does not have a good understanding of human preferences, and acts in ways that that are unsafe in ways that we’ve already talked about, then people are simply not going to buy it. If it misunderstands these things, then it might book you into a $20,000-a-night hotel room, or it might cancel a meeting with the vice president because you’re supposed to go to the dentist.
In those kinds of situations, the AI is misunderstanding your preferences and, rather than being humble about its understanding of your preferences, it thinks that it knows what you want, and it is just being plain wrong about it. I’ve cited in other forums the example of a domestic robot that doesn’t understand that the nutritional value of a cat is a lot less than the sentimental value of a cat, and so it just decides to cook the cat for dinner. I
f that happened, that would be the end of the domestic robot industry. No one is going to want a robot in its house that could make that kind of mistake.
Today, AI companies that are producing increasingly intelligent products have to solve at least a version of this problem in order for their products to be good AI systems.
We need to get the AI community to understand that AI that is not controllable and safe, is just not good AI.
In the same way that a bridge that falls down is simply not a good bridge, we need to recognize that AI that is not controllable and safe, is just not good AI. Civil engineers don’t go around saying, “Oh yeah, I design bridges that don’t fall down, you know, unlike the other guy, he designs bridges that fall down.” It’s just built into the meaning of the word “bridge” that it’s not supposed to fall down.
This should be built into the meaning of what we mean when we define AI. We need to define AI in such a way that it remains under the control of the humans that it’s supposed to be working for, in any country. And we need to define AI so that it has, now and in the future, properties that we call corrigibility: that it is able to be switched off, and that it is able to be corrected if it’s doing something that we don’t like.
If we can get everyone in AI, around the world, to understand that these are just necessary characteristics of good AI, then I think we move a long way forward in making the future prospects of the field of AI much, much brighter.
There’s also no better way to kill the field of AI than to have a major control failure, just as the nuclear industry killed itself through Chernobyl and Fukushima. AI will kill itself if we fail to address the control issue.
MARTIN FORD: So, on balance, are you an optimist? Do you think that things are going to work out?
STUART J. RUSSELL: Yes, I do think that I’m an optimist. I think there’s a long way to go. We are just scratching the surface of this control problem, but the first scratching seems to be productive, and so I’m reasonably optimistic that there is a path of AI development that leads us to what we might describe as “provably beneficial AI systems.”
Of course, there is the risk that even if we do solve the control problem and even if we do build provably beneficial AI systems, that there will be some parties who choose not to use them. The risk here is that one party or another chooses only to magnify the capabilities of AI without regarding the safety aspects.
This could be the Dr. Evil character type, the Austin Powers villain who wants to take over the world and accidentally releases an AI system that ends up being catastrophic for everyone. Or it could be a much more sociological risk, where it starts off as very nice for society to have capable, controllable AI but we then overuse it. In those risk scenarios, we head towards an enfeebled human society where we’ve moved too much of our knowledge and too much of our decision-making into machines, and we can never recover it. We could eventually lose our entire agency as humans along this societal path.
This societal picture is how the future is depicted in the WALL-E movie, where humanity is off on spaceships and being looked after by machines. Humanity gradually becomes fatter and lazier and stupider. That’s an old theme in science fiction and it’s very clearly illustrated in the WALL-E movie. That is a future that we need to be concerned about, assuming we successfully navigate all the other risks that we’ve been discussing.
As an optimist, I can also see a future where AI systems are well enough designed that they’re saying to humans, “Don’t use us. Get on and learn stuff yourself. Keep your own capabilities, propagate civilization through humans, not through machines.”
Of course, we might still ignore a helpful and well-design AI, if we prove to be too lazy and greedy as a race; and then we’ll pay the price. In that sense, this really might become more of a sociocultural problem, and I do think that we need to do work as a human race to prepare and make sure this doesn’t happen.
STUART J. RUSSELL is a professor of electrical engineering and computer science at the University of California Berkeley and is widely recognized as one of the world’s leading contributors in the field of artificial intelligence. He is the co-author, along with Peter Norvig, of Artificial Intelligence: A Modern Approach, which is the leading AI textbook currently in use at over 1300 colleges and universities in 118 countries.
Stuart received his undergraduate degree in Physics from Wadham College, Oxford in 1982 and his PhD in Computer Science from Stanford in 1986. His research has covered many topics related to AI, such as machine learning, knowledge representation, and computer vision, and he has received numerous awards and distinctions, including the IJCAI Computers and Thought Award and election as a fellow to the American Association for the Advancement of Science, the Association for the Advancement of Artificial Intelligence and the Association of Computing Machinery.
Chapter 4. GEOFFREY HINTON
In the past when AI has been overhyped—including backpropagation in the 1980s—people were expecting it to do great things, and it didn’t actually do things as great as they hoped. Today, it’s already done great things, so it can’t possibly all be just hype.
EMERITUS DISTINGUISHED PROFESSOR OF COMPUTER SCIENCE, UNIVERSITY OF TORONTO VICE PRESIDENT & ENGINEERING FELLOW, GOOGLE
Geoffrey Hinton is sometimes known as the Godfather of Deep Learning, and he has been the driving force behind some of its key technologies, such as backpropagation, Boltzmann machines, and the Capsules neural network. In addition to his roles at Google and the University of Toronto, he is also Chief Scientific Advisor of the Vector Institute for Artificial Intelligence.
MARTIN FORD: You’re most famous for working on the backpropagation algorithm. Could you explain what backpropagation is?
GEOFFREY HINTON: The best way to explain it is by explaining what it isn’t. When most people think about neural networks, there’s an obvious algorithm for training them: Imagine you have a network that has layers of neurons, and you have an input at the bottom layer, and an output at the top layer. Each neuron has a weight associated with each connection. What each neuron does is look at the neurons in the layer below and it multiplies the activity of a neuron in the layer below by the weight, then adds all that up and gives an output that’s a function of that sum. By adjusting the weights on the connections, you can get networks that do anything you like, such as looking at a picture of a cat and labeling it as a cat.
The question is, how should you adjust the weights so that the network does what you want? There’s a very simple algorithm that will actually work but is incredibly slow—it’s a dumb mutation algorithm—where you start with random weights on all the connections, and you give your network a set of examples and see how well it works. You then take one of those weights, and you change it a little bit, and now you give it another set of examples to see if it works better or worse than it did before. If it works better than it did before, you keep the change you made. If it works worse than it did before, you don’t keep that change, or perhaps you change the weight in the opposite direction. Then you take another weight, and you do the same thing.
You have to go around all of the weights, and for each weight, you have to measure how well the network does on a set of examples, with each weight having to be updated multiple times. It is an incredibly slow algorithm, but it works, and it’ll do whatever you want.
Backpropagation is basically a way of achieving the same thing. It’s a way of tinkering with the weights so that the network does what you want, but unlike the dumb algorithm, it’s much, much faster. It’s faster by a factor of how many weights there are in the network. If you’ve got a network with a billion weights, backpropagation is going to be a billion times faster than the dumb algorithm.
The dumb algorithm works by having you adjust one of the weights slightly, followed by you measuring to see how well the network does. For evolution, that’s what you’ve got to do because the process that takes you from your genes to the finished product depends on the environment you’re
in. There’s no way you can predict exactly what the phenotype will look like from the genotype, or how successful the phenotype will be because that depends on what’s going on in the world.
In a neural net, however, the processor takes you from the input and the weights to how successful you are in producing the right output. You have control over that whole process because it’s all going on inside the neural net; you know all the weights that are involved. Backpropagation makes use of all that by sending information backward through the net. Using the fact that it knows all the weights, it can compute in parallel for every single weight in the network, whether you should make it a little bit bigger or smaller to improve the output.
The difference is that in evolution, you measure the effect of a change, and in backpropagation, you compute what the effect would be of making a change, and you can do that for all the weights at once with no interference. With backpropagation you can adjust the weights rapidly because you can give it a few examples, then backpropagate the discrepancies between what it said and what it should have said, and now you can figure out how to change all of the weights simultaneously to make all of them a little bit better. You still need to do the process a number of times, but it’s much faster than the evolutionary approach.
MARTIN FORD: The backpropagation algorithm was originally created by David Rumelhart, correct, and you took that work forward?
GEOFFREY HINTON: Lots of different people invented different versions of backpropagation before David Rumelhart. They were mainly independent inventions, and it’s something I feel I’ve got too much credit for. I’ve seen things in the press that say I invented backpropagation, and that’s completely wrong. It’s one of these rare cases when an academic feels he’s got too much credit for something! My main contribution was to show how you can use it for learning distributed representations, so I’d like to set the record straight on that.