by Martin Ford
YOSHUA BENGIO is Full Professor of the Department of Computer Science and Operations Research, scientific director of the Montreal Institute for Learning Algorithms (Mila), CIFAR Program co-director of the CIFAR program on Learning in Machines and Brains, Canada Research Chair in Statistical Learning Algorithms. Together with Ian Goodfellow and Aaron Courville, he wrote Deep Learning, one of the defining textbooks on the subject. The book is available for free from https://www.deeplearningbook.org.
Chapter 3. STUART J. RUSSELL
Once an AGI gets past kindergarten reading level, it will shoot beyond anything that any human being has ever done, and it will have a much bigger knowledge base than any human ever has.
PROFESSOR OF COMPUTER SCIENCE, UNIVERSITY OF CALIFORNIA, BERKELEY
Stuart J. Russell is widely recognized as one of the world’s leading contributors in the field of artificial intelligence. He is a Professor of Computer Science and Director of the Center for Human-Compatible Artificial Intelligence at The University of California, Berkeley. Stuart is the co-author of the leading AI textbook, Artificial Intelligence: A Modern Approach, which is in use at over 1,300 colleges and universities throughout the world.
MARTIN FORD: Given that you co-wrote the standard textbook on AI in use today, I thought it might be interesting if you could define some key AI terms. What is your definition of artificial intelligence? What does it encompass? What types of computer science problems would be included in that arena? Could you compare it or contrast it with machine learning?
STUART J. RUSSELL: Let me give you, shall we say, the standard definition of artificial intelligence, which is similar to the one in the book and is now quite widely accepted: An entity is intelligent to the extent that it does the right thing, meaning that its actions are expected to achieve its objectives. The definition applies to both humans and machines. This notion of doing the right thing is the key unifying principle of AI. When we break this principle down and look deeply at what is required to do the right thing in the real world, we realize that a successful AI system needs some key abilities, including perception, vision, speech recognition, and action.
These abilities help us to define artificial intelligence. We’re talking about the ability to control robot manipulators, and everything that happens in robotics. We’re talking about the ability to make decisions, to plan, and to problem-solve. We’re talking about the ability to communicate, and so natural language understanding also becomes extremely important to AI.
We’re also talking about an ability to internally know things. It’s very hard to function successfully in the real world if you don’t actually know anything. To understand how we know things, we enter the scientific field that we call knowledge representation. This is where we study how knowledge can be stored internally and then processed by reasoning algorithms, such as automated logical deduction and probabilistic inference algorithms.
Then there is learning. Learning is a key ability for modern artificial intelligence. Machine learning has always been a subfield of AI, and it simply means improving your ability to do the right thing as a result of experience. That could be learning how to perceive better by seeing labeled examples of objects. That could also mean learning how to reason better by experience—such as discovering which reasoning steps turn out to be useful for solving a problem, and which reasoning steps turn out to be less useful.
AlphaGo, for example, is a modern AI Go program that recently beat the best human world-champion players, and it really does learn. It learns how to reason better from experience. As well as learning to evaluate positions, AlphaGo learns how to control its own deliberations so that it more effectively reaches high decision-quality moves more quickly, with less computation.
MARTIN FORD: Can you also define neural networks and deep learning?
STUART J. RUSSELL: Yes, in machine learning one of the standard techniques is called “supervised learning,” where we give the AI system a set of examples of a concept, along with a description and a label for each example in the set. For example, we might have a photograph, where we’ve got all the pixels in the image, and then we have a label saying that this is a photograph of a boat, or of a Dalmatian dog, or of a bowl of cherries. In supervised learning for this task, the goal is to find a predictor, or a hypothesis, for how to classify images in general.
From these supervised training examples, we try to give an AI the ability to recognize pictures of, say, Dalmatian dogs, and the ability to predict how other pictures of Dalmatian dogs might look.
One way of representing the hypothesis, or the predictor, is a neural net. A neural net is essentially a complicated circuit with many layers. The input into this circuit could be the values of pixels from pictures of Dalmatian dogs. Then, as those input values propagate through the circuit, new values are calculated at each layer of the circuit. At the end, we have the outputs of the neural network, which are the predictions about what kind of object is being recognized.
So hopefully, if there’s a Dalmatian dog in our input image, then by the time all those numbers and pixel values propagate through the neural network and all of its layers and connections, the output indicator for a Dalmatian dog will light up with a high value, and the output indicator for a bowl of cherries will have a low value. We then say that the neural network has correctly recognized a Dalmatian dog.
MARTIN FORD: How do you get a neural network to recognize images?
STUART J. RUSSELL: This is where the learning process comes in. The circuit has adjustable connection strengths between all its connections, and what the learning algorithms do is adjust those connection strengths so that the network tends to give the correct predictions on the training examples. Then if you’re lucky, the neural network will also give correct predictions on new images that it hasn’t seen before. And that’s a neural network!
Going one step further, deep learning is where we have neural networks that have many layers. There is no required minimum for a neural network to be deep, but we would usually say that two or three layers is not a deep learning network, while four or more layers is deep learning.
Some deep learning networks get up to one thousand layers or more. By having many layers in deep learning, we can represent a very complex transformation between the input and output, by a composition of much simpler transformations, each represented by one of those layers in the network.
The deep learning hypothesis suggests that many layers make it easier for the learning algorithm to find a predictor, to set all the connection strengths in the network so that it does a good job.
We are just beginning now to get some theoretical understanding of when and why the deep learning hypothesis is correct, but to a large extent, it’s still a kind of magic, because it really didn’t have to happen that way. There seems to be a property of images in the real world, and there is some property of sound and speech signals in the real world, such that when you connect that kind of data to a deep network it will—for some reason—be relatively easy to learn a good predictor. But why this happens is still anyone’s guess.
MARTIN FORD: Deep learning is receiving enormous amounts of attention right now, and it would be easy to come away with the impression that artificial intelligence is synonymous with deep learning. But deep learning is really just one relatively small part of the field, isn’t it?
STUART J. RUSSELL: Yes, it would be a huge mistake for someone to think that deep learning is the same thing as artificial intelligence, because the ability to distinguish Dalmatian dogs from bowls of cherries is useful but it is still only a very small part of what we need to give an artificial intelligence in order for it to be successful. Perception and image recognition are both important aspects of operating successfully in the real world, but deep learning is only one part of the picture.
AlphaGo, and its successor AlphaZero, created a lot of media attention around deep learning with stunning advances in Go and Chess, but they’re really a hybrid of classical search-based A
I and a deep learning algorithm that evaluates each game position that the classical AI system searches through. While the ability to distinguish between good and bad positions is central to AlphaGo, it cannot play world-champion-level Go just by deep learning.
Self-driving car systems also use a hybrid of classical search-based AI and deep learning. Self-driving cars are not just pure deep learning systems, because that does not work very well. Many driving situations need classical rules for an AI to be successful. For example, if you’re in the middle lane and you want to change lanes to the right, and there’s someone trying to pass you on the inside, then you should wait for them to go by first before you pull over. For road situations that require lookahead, because no satisfactory rule is available, it may be necessary to imagine various actions that the car could take as well as the various actions that other cars might take, and then decide if those outcomes are good or bad.
While perception is very important, and deep learning lends itself well to perception, there are many different types of ability that we need to give an AI system. This is particularly true when we’re talking about activities that span over long timescales, like going on a vacation. Or very complex actions like building a factory. There’s no possibility that those kinds of activities can be orchestrated by purely deep learning black-box systems.
Let me take the factory example to close my point about the limitations of deep learning here. Let’s imagine we try to use deep learning to build a factory. (After all, we humans know how to build a factory, don’t we?) So, we’ll take billions of previous examples of building factories to train a deep learning algorithm; we’ll show it all the ways that people have built factories. We take all that data and we put it into a deep learning system and then it knows how to build factories. Could we do that? No, it’s just a complete pipe dream. There is no such data, and it wouldn’t make any sense, even if we had it, to try to build factories that way.
We need knowledge to build factories. We need to be able to construct plans. We need to be able to reason about physical obstructions and the structural properties of the buildings. We can build AI systems to work out these real-world problems, but it isn’t achieved by deep learning. Building a factory requires a different type of AI altogether.
MARTIN FORD: Are there recent advances in AI that have struck you as being more than just incremental? What would you point to that is at the absolute forefront of the field right now?
STUART J. RUSSELL: It’s a good question, because a lot of the things that are in the news at the moment are not really conceptual breakthroughs, they are just demos. The chess victory of Deep Blue over Kasparov is a perfect example. Deep Blue was basically a demo of algorithms that were designed 30 years earlier and had been gradually enhanced and then deployed on increasingly powerful hardware, until they could beat a world chess champion. But the actual conceptual breakthroughs behind Deep Blue were in how to design a chess program: how the lookahead works; the alpha-beta algorithm for reducing the amount of searching that had to be done; and some of the techniques for designing the evaluation functions. So, as is often the case, the media described the victory of Deep Blue over Kasparov as a breakthrough when in fact, the breakthrough had occurred decades earlier.
The same thing is still happening today as well. For instance, a lot of the recent AI reports about perception and speech recognition, and headlines about dictation accuracy being close to or exceeding human dictation accuracy, are all very impressive practical engineering results, but they are again demos of conceptual breakthroughs that happened much earlier—from the early deep learning systems and convolutional networks that date right back to the late ‘80s and early ‘90s.
It’s been something of a surprise that we already had the tools decades ago to do perception successfully; we just weren’t using them properly. By applying modern engineering to older breakthroughs, by collecting large datasets and processing them across very large networks on the latest hardware, we’ve managed to create a lot of interest recently in AI, but these have not necessarily been at the real forefront of AI.
MARTIN FORD: Do you think DeepMind’s AlphaZero is a good example of a technology that’s right on the frontier of AI research?
STUART J. RUSSELL: I think AlphaZero was interesting. To me, it was not particularly a surprise that you could use the same basic software that played Go to also play chess and Shogi at world-champion level. So, it was not at the forefront of AI in that sense.
I mean, it certainly gives you pause when you think that AlphaZero, in the space of less than twenty-four hours, learned to play at superhuman levels in three different games using the same software. But that’s more a vindication of an approach to AI that says that if you have a clear understanding of the problem class, especially deterministic, two-player, turn-taking, fully-observable games with known rules, then those kinds of problems are amenable to a well-designed class of AI algorithms. And these algorithms have been around for some time—algorithms that can learn good evaluation functions and use classical methods for controlling search.
It’s also clear that if you want to extend those techniques to other classes of problems, you’re going to have to come up with different algorithmic structures. For example, partial observability—meaning that you can’t see the board, so to speak—requires a different class of algorithm. There’s nothing AlphaZero can do to play poker, for example, or to drive a car. Those tasks require an AI system that can estimate things that it can’t see. AlphaZero assumes that the pieces on the board are the pieces on the board, and that’s that.
MARTIN FORD: There was also a poker playing AI system developed at Carnegie Mellon University, called Libratus? Did they achieve a genuine AI breakthrough there?
STUART J. RUSSELL: Carnegie Mellon’s Libratus poker AI was another very impressive hybrid AI example: it was a combination of several different algorithmic contributions that were pieced together from research that’s happened over the last 10 or 15 years. There has been a lot of progress in dealing with games like poker, which are games of partial information. One of the things that happens with partial-information games, like poker, is that you must have a randomized playing strategy because if, say, you always bluff, then people figure out that you’re bluffing and then they call your bluff. But if you never bluff, then you can never steal a game from your opponent when you have a weak hand. It’s long been known, therefore, that for these kinds of card games, you should randomize your playing behavior, and bluff with a certain probability.
The key to playing poker extremely well is adjusting those probabilities for how to bet; that is, how often to bet more than your hand really justifies, and how often to bet less. The calculations for these probabilities are feasible for an AI, and they can be done very exactly, but only for small versions of poker, for example where there are only a few cards in a pack. It’s very hard for an AI to do these calculations accurately for the full game of poker. As a result, over the decade or so that people have been working on scaling up poker, we’ve gradually seen improvements in the accuracy and efficiency of how to calculate these probabilities for larger and larger versions of poker.
So yes, Libratus is another impressive modern AI application. But whether the techniques are at all scalable, given that it has taken a decade to go from one version of poker to another slightly larger version of poker, I’m not convinced. I think there’s also a reasonable question about how much those game-theoretic ideas in poker extend into the real world. We’re not aware of doing much randomization in our normal day-to-day lives, even though—for sure—the world is full of agents; so it ought to be game-theoretic, and yet we’re not aware of randomizing very much in our day-to-day lives.
MARTIN FORD: Self-driving cars are one of the highest-profile applications of AI. What is your estimate for when fully autonomous vehicles will become a truly practical technology? Imagine you’re in a random place in Manhattan, and you call up an Uber, and it’s going to arrive with no one
in it, and then it will take you to another random place that you specify. How far off is that realistically, do you think?
STUART J. RUSSELL: Yes, the timeline for self-driving cars is a concrete question, and it’s also an economically important question because companies are investing a great deal in these projects.
It is worth noting that the first actual self-driving car, operating on a public road, was 30 years ago! That was Ernst Dickmanns’ demo in Germany of a car driving on the freeway, changing lanes, and overtaking other vehicles. The difficulty of course is trust: while you can run a successful demonstration for a short time, you need an AI system to run for decades with no significant failures in order to qualify as a safe vehicle.
The challenge, then, is to build an AI system that people are willing to trust themselves and their kids to, and I don’t think we’re quite there.
Results from vehicles that are being tested in California at the moment indicate that humans still feel they must intervene as frequently as once every mile of road testing. There are more successful AI driving projects, such as Waymo, which is the Google subsidiary working on this, that have some respectable records; but they are still, I think, several years away from being able to do this in a wide range of conditions.
Most of these tests have been conducted in good conditions on well-marked roads. And as you know, when you’re driving at night and it’s pouring with rain, and there are lights reflecting off the road, and there may also be roadworks, and they might have moved the lane markers, and so on ... if you had followed the old lane markers, you’d have driven straight into a wall by now. I think in those kinds of circumstances, it’s really hard for AI systems. That’s why I think that we’ll be lucky if the self-driving car problem is solved sufficiently in the next five years.