Architects of Intelligence
Page 25
Chapter 11. RAY KURZWEIL
The scenario that I have is that we will send medical nanorobots into our bloodstream. [...] These robots will also go into the brain and provide virtual and augmented reality from within the nervous system rather than from devices attached to the outside of our bodies.
DIRECTOR OF ENGINEERING AT GOOGLE
Ray Kurzweil is one of the world’s leading inventors, thinkers, and futurists. He has received 21 honorary doctorates, and honors from three US presidents. He is the recipient of the MIT Lemelson Prize for innovation and in 1999, he received the National Medal of Technology, the nation’s highest honor in technology, from President Clinton. Ray is also a prolific writer, authoring 5 national bestsellers. In 2012, Ray became a Director of Engineering at Google—heading up a team of engineers developing machine intelligence and natural language understanding. Ray’s first novel, Danielle, Chronicles of a Superheroine, is being published in early 2019. Another book by Ray, The Singularity is Nearer, is expected to be published in late 2019.
MARTIN FORD: How did you come to start out in AI?
RAY KURZWEIL: I first got involved in AI in 1962, which was only 6 years after the term was coined by Marvin Minsky and John McCarthy at the 1956 Dartmouth Conference in Hanover, New Hampshire.
The field of AI had already bifurcated into two warring camps: the symbolic school and the connectionist school. The symbolic school was definitely in the ascendancy with Marvin Minsky regarded as its leader. The connectionists were the upstarts, and one such person was Frank Rosenblatt at Cornell University, who had the first popularized neural net called the perceptron. I wrote them both letters and they both invited me to come up, so I first went to visit Minsky, where he spent all day with me and we struck up a rapport that would last for 55 years. We talked about AI, which at the time was a very obscure field that nobody was really paying attention to. He asked who I was going to see next, and when I mentioned Dr. Rosenblatt, he said that I shouldn’t bother.
I then went to go and see Dr. Rosenblatt, who had this single-layer neural net called the perceptron; it was a hardware device that had a camera. I brought some printed letters to my meeting with Dr. Rosenblatt where his device recognized them perfectly as long as they were in Courier 10.
Other type styles didn’t work as well, and he said, “Don’t worry, I can take the output of the perceptron and feed it as the input to a secondary perceptron, then we can take the output of that and feed it to a third layer, and as we add layers it’ll get smarter and generalize and be able to do all these remarkable things.” I responded saying, “Have you tried that?”, and he said, “well, not yet, but it’s high on our research agenda.”
Things didn’t move quite as quickly back in the 1960s as they do today, and sadly he died 9 years later in 1971 never having tried that idea. The idea was remarkably prescient, however. All of the excitement we see now in neural nets is due to these deep neural networks with many layers. It was a pretty remarkable insight, as it really was not obvious that it would work.
In 1969, Minsky wrote his book, Perceptrons, with his colleague, Seymour Papert. The book basically proved a theorem that a perceptron could not devise answers that required the use of the XOR logical function, nor could they solve the connectedness problem. There are two maze-like images on the cover of that book, and if you look carefully, you can see one is fully connected, and the other is not. Making that classification is called the connectedness problem. The theorem proved that a perceptron could not do that. The book was very successful in killing all funding for connectionism for the next 25 years, which is something Minsky regretted, as shortly before he died he told me that he now appreciated the power of deep neural nets.
MARTIN FORD: Marvin Minsky did work on early connectionist neural nets back in the ‘50s, though, right?
RAY KURZWEIL: That’s right, but he became disillusioned with them by the 1960s, and really didn’t appreciate the power of multi-layer neural nets. It was not apparent until decades later when 3-layer neural nets were tried and they worked somewhat better. There was a problem going with too many layers, because of the exploding gradient or vanishing gradient problem, which is basically where the dynamic range of the values of the coefficients would decline because the numbers got too big or too small.
Geoffrey Hinton and a group of mathematicians solved that problem and now we can go to any number of levels. Their solution was that you recalibrate the information after each level, so it doesn’t outstrip the range of values that can be represented and these 100-layer neural nets have been very successful. There’s still a problem though, which is summarized by the motto, “Life begins at a billion examples.”
One of the reasons I’m here at Google is that we do have a billion examples of some things like pictures of dogs and cats and other image categories that are annotated, but there are also lots of things we don’t have a billion examples of. We have lots of examples of language, but they’re not annotated with what they mean, and how could we annotate them anyway using language that we can’t understand in the first place? There’s a certain category of problems where we can work around that, and playing Go is a good example. The DeepMind system was trained on all of the online moves, which is in the order of a million moves. That’s not a billion. That created a fair amateur player, but they need another 999 million examples, so where are they going to get them from?
MARTIN FORD: What you’re getting at is that deep learning right now is very dependent on labeled data and what’s called supervised learning.
RAY KURZWEIL: Right. One way to work around it is if you can simulate the world you’re working in, then you can create your own training data, and that’s what DeepMind did by having it play itself. They could annotate the moves with traditional annotation methods. Subsequently AlphaZero actually trained a neural net to improve on the annotation, so it was able to defeat AlphaGo 100 games to 0 starting with no human training data.
The question is, in what situations can you do that in? For example, another situation where we can do that is math, because we can simulate math. The axioms of number theory are no more complicated than the rules of Go.
Another situation is self-driving cars, even though driving is much more complex than a board game or the axioms of a math system. The way that worked is that Waymo created a pretty good system with a combination of methods and then drove millions of miles with humans at the wheel ready to take over. That generated enough data to create an accurate simulator of the world of driving. They’ve now driven on the order of a billion miles with simulated vehicles in the simulator, which has generated training data for a deep neural net designed to improve the algorithms. This has worked even though the world of driving is much more complex than a board game.
The next exciting area to attempt to simulate is the world of biology and medicine. If we could simulate biology, and it’s not impossible, then we could do clinical trials in hours rather than years, and we could generate our own data just like we’re doing with self-driving cars or board games or math.
That’s not the only approach to the problem of providing sufficient training data. Humans can learn from much less data because we engage in transfer learning, using learning from situations which may be fairly different from what we are trying to learn. I have a different model of learning based on a rough idea of how the human neocortex works. In 1962 I came up with a thesis on how I thought the human brain works, and I’ve been thinking about thinking for the last 50 years. My model is not one big neural net, but rather many small modules, each of which can recognize a pattern. In my book, How to Create a Mind, I describe the neocortex as basically 300 million of those modules, and each can recognize a sequential pattern and accept a certain amount of variability. The modules are organized in a hierarchy, which is created through their own thinking. The system creates its own hierarchy.
That hierarchical model of the neocortex can learn from much less data. It’s the same with humans. We can l
earn from a small amount of data because we can generalize information from one domain to another.
Larry Page, one of the co-founders of Google, liked my thesis in How to Create a Mind and recruited me to Google to apply those ideas to understanding language.
MARTIN FORD: Do you have any real-world examples of you applying those concepts to a Google product?
RAY KURZWEIL: Smart Reply on Gmail (which provides three suggestions to reply to each email) is one application from my team that uses this hierarchical system. We just introduced Talk to Books (https://books.google.com/talktobooks/), where you ask a question in natural language and the system then reads 100,000 books in a half-second—that’s 600 million sentences—and then returns the best answers that it can find from those 600 million sentences. It’s all based on semantic understanding, not keywords.
At Google we’re making progress in natural language, and language was the first invention of the neocortex. Language is hierarchical; we can share the hierarchical ideas we have in our neocortex with each other using the hierarchy of language. I think Alan Turing was prescient in basing the Turing test on language because I think it does require the full range of human thinking and human intelligence to create and understand language at human levels.
MARTIN FORD: Is your ultimate objective to extend this idea to actually build a machine that can pass the Turing test?
RAY KURZWEIL: Not everybody agrees with this, but I think the Turing test, if organized correctly, is actually a very good test of human-level intelligence. The issue is that in the brief paper that Turing wrote in 1950, it’s really just a couple of paragraphs that talked about the Turing test, and he left out vital elements. For example, he didn’t describe how to actually go about administering the test. The rules of the test are very complicated when you actually administer it, but if a computer is to pass a valid Turing test, I believe it will need to have the full range of human intelligence. Understanding language at human levels is the ultimate goal. If an AI could do that, it could read all documents and books and learn everything else. We’re getting there a little bit at a time. We can understand enough of the semantics, for example to enable our Talk to Books application to come up with reasonable answers to questions, but it’s still not at human levels. Mitch Kapor and I have a long-range bet on this for $20,000, with the proceeds to go to the charity of the winner’s choice. I’m saying that an AI will pass the Turing test by 2029, whereas he’s saying no.
MARTIN FORD: Would you agree that for the Turing test to be an effective test of intelligence, there probably shouldn’t be a time limit at all? Just tricking someone for 15 minutes seems like a gimmick.
RAY KURZWEIL: Absolutely, and if you look at the rules that Mitch Kapor and I came up with, we gave a number of hours, and maybe even that’s not enough time. The bottom line is that if an AI is really convincing you that it’s human, then it passes the test. We can debate how long that needs to be—probably several hours if you have a sophisticated judge—but I agree that if the time is too short, then you might get away with simple tricks.
MARTIN FORD: I think it’s easy to imagine an intelligent computer that just isn’t very good at pretending to be human because it would be an alien intelligence. So, it seems likely that you could have a test where everyone agreed that the machine was intelligent, even though it didn’t actually seem to be human. And we would probably want to recognize that as an adequate test as well.
RAY KURZWEIL: Whales and octopi have large brains and they exhibit intelligent behavior, but they’re obviously not in a position to pass the Turing test. A Chinese person who speaks mandarin and not English would not pass the English Turing test, so there are lots of ways to be intelligent without passing the test. The key statement is the converse: In order to pass the test, you have to be intelligent.
MARTIN FORD: Do you believe that deep learning, combined with your hierarchical approach, is really the way forward, or do you think there needs to be some other massive paradigm shift in order to get us to AGI/human-level intelligence?
RAY KURZWEIL: No, I think humans use this hierarchical approach. Each of these modules is capable of doing learning, and I actually make the case in my book that in the brain they’re not doing deep learning in each module, they’re doing something equivalent to a Markov process, but it actually is better to use deep learning.
In our systems at Google we use deep learning to create vectors that represent the patterns in each module and then we have a hierarchy that goes beyond the deep learning paradigm. I think that’s sufficient for AGI, though. The hierarchical approach is how the human brain does it in my view, and there’s a lot of evidence now for that from the brain reverse engineering projects.
There’s an argument that human brains follow a rule-based system rather than a connectionist one. People point out that humans are capable of having sharp distinctions and we’re capable of doing logic. A key point is that connectionism can emulate a rule-based approach. A connectionist system in a certain situation might be so certain of its judgment that it looks and acts like a rule-based system, but then it’s also able to deal with rare exceptions and the nuances of its apparent rules.
A rule-based system really cannot emulate a connectionist system, so the converse statement is not the case. Doug Lenat’s “Cyc” is an impressive project, but I believe that it proves the limitations of a rule-based system. You reach a complexity ceiling, where the rules get so complex that if you try to fix one thing, you break three other things.
MARTIN FORD: Cyc is the project where people are manually trying to enter logic rules for common sense?
RAY KURZWEIL: Right. I’m not sure of the count, but they have a vast number of rules. They had a mode where it could print out its reasoning for a behavior and the explanations would go on for a number of pages and are very hard to follow. It’s impressive work, but it does show that this is really not the approach, at least not by itself, and it’s not how humans achieve intelligence. We don’t have cascades of rules that we go through, we have this hierarchical self-organizing approach.
I think another advantage of a hierarchical, but connectionist approach is that it’s better at explaining itself because you can look at the modules in the hierarchy and see which module influences which decision. When you have these massive 100-layer neural nets, they act like a big black box. It’s very hard to understand its reasoning, though there have been some attempts to do that. I do think that this hierarchical spin on a connectionist approach is an effective approach, and that’s how humans think.
MARTIN FORD: There are some structures, though, in the human brain, even at birth. For example, babies can recognize faces.
RAY KURZWEIL: We do have some feature generators. For example, in our brains we have this module called the fusiform gyrus that contains specialized circuitry and computes certain ratios, like the ratio of the tip of the nose to the end of the nose, or the distance between the eyes. There is set of a dozen or so fairly simple features, and experiments have shown that if we generate those features from images and then generate new images that have the same features—the same ratios—then people will immediately recognize them as a picture of that same person, even though other details have changed quite a bit in the image. There are various feature generators like that, some with audio information that we compute certain ratios and recognize partial overtones, and these features then feed into the hierarchical connectionist system. So, it is important to understand these feature generators, and there are some very specific features in recognizing faces, and that’s what babies rely on.
MARTIN FORD: I’d like to talk about the path and the timing for Artificial General Intelligence (AGI). I’m assuming AGI and human-level AI are equivalent terms.
RAY KURZWEIL: They’re synonyms, and I don’t like the term AGI because I think it’s an implicit criticism of AI. The goal of AI has always been to achieve greater and greater intelligence and ultimately to reach human levels of intelligence. As we’
ve progressed, though, we’ve spun off separate fields. For example, once we mastered recognizing characters, it became the separate field of OCR. The same happened with speech recognition and robotics, and it was felt that the overarching field of AI was no longer focusing on general intelligence. My view is always that we’ll get to general intelligence step by step by solving one problem at a time.
Another bit of color on that is that human performance in any type of task is a very broad range. What is the human performance level in Go? It’s a broad range from a child who’s playing their first game to the world champion. One thing we’ve seen is that once a computer can achieve human levels, even at the low end of that range, it very quickly soars past human performance. A little over a year ago computers were playing at a low-level in Go and then they quickly soared past that. More recently, AlphaZero soared past AlphaGo and beat it 100 games to 0, after training for a few hours.
Computers are also improving in their language understanding, but not at the same rate, because they don’t yet have sufficient real-world knowledge. Computers currently can’t do multi-chain reasoning very well, basically taking inferences from multiple statements while at the same time considering real-world knowledge. For example, on a third-grade language understanding test, a computer didn’t understand that if a boy had muddy shoes he probably got them muddy by walking in the mud outside and if he got the mud on the kitchen floor it would make his mother mad. That may all seem obvious to us humans because we may have experienced that, but it’s not obvious to the AI.
I don’t think the process will be as quick to go from the average adult comprehension performance that we have now for computers on some language tests to superhuman performance because I think there are more fundamental issues to solve to do that. Nonetheless, human performance is a broad range, as we’ve seen, and once computers get in that range they can ultimately soar past it to become superhuman. The fact that they’re performing at any kind of adult level in language understanding is very impressive because I feel that language requires the full range of human intelligence, and has the full range of human ambiguity and hierarchical thinking. To sum up, yes, AI is making very rapid progress and yes, all of this is using connectionist approaches.