by Martin Ford
Daphne received her undergraduate and masters degrees at Hebrew University of Jerusalem in Israel and her PhD in computer science at Stanford in 1993. She has received numerous awards for her research and is a fellow of the Association for the Advancement of Artificial Intelligence. She was inaugurated into the National Academy of Engineering in 2011. In 2013, Daphne was named one of the world’s 100 most influential people by Time magazine.
Chapter 19. DAVID FERRUCCI
I don’t think, as other people might, that we don’t know how to do [AGI] and we’re waiting for some enormous breakthrough. I don’t think that’s the case, I think we do know how to do it, we just need to prove that.
FOUNDER, ELEMENTAL COGNITION DIRECTOR OF APPLIED AI, BRIDGEWATER ASSOCIATES
David Ferrucci built and led the IBM Watson team from its inception to its landmark success in 2011 when Watson defeated the greatest Jeopardy! players of all time. In 2015 he founded his own company, Elemental Cognition, focused on creating novel AI systems that dramatically accelerate a computer’s ability to understand language.
MARTIN FORD: How did you become interested in computers? What’s the path that led you to AI?
DAVID FERRUCCI: I started back before computers were an everyday term. My parents wanted me to become a medical doctor, and my dad hated the fact that I would be home during the school holidays without anything to do. In the summer of my junior year at high school, my dad looked in the paper and found a math class for me at a local college. It turned out that it was actually a programming class using BASIC on DEC computers. I thought it was phenomenal because you could give this machine instructions, and if you could articulate the procedure or the algorithm that you’re going through in your head you could get the machine to do it for you. The machine could store the data AND the thought process. I imagined this was my way out! If I could get the machine to think and memorize everything for me, then I wouldn’t have to do all of that work to become a doctor.
It got me interested in what it meant to store information, to reason over it, to think, and to systematize or to turn into an algorithm whatever process was going on in my brain. If I could just specify that in enough detail, then I could get the computer to do it, and that was enthralling. It was just a mind-altering realization.
I didn’t know the words “artificial intelligence” at the time, but I got very interested in the whole notion of coordinated intelligence from a mathematical, algorithmic, and philosophical perspective. I believed that modeling human intelligence in the machine was possible. There was no reason to think that it wasn’t.
MARTIN FORD: Did you follow that with computer science at college?
DAVID FERRUCCI: No, I had no idea about careers in computer science or AI, so I went to college and majored in biology to become a medical doctor. During my studies, I got my grandparents to buy me an Apple II computer, and I just started programming everything I could think of. I ended up programming a lot of software for my college, from graphing software for experimental lab work, to ecology simulation software, to analog-to-digital interfacing for lab equipment. This, of course, was before any of this stuff even existed, never mind being able to just download it form the internet. I decided to do as much computer science as I could in my last year of college, so I did a minor in it. I graduated with the top biology award and I was ready to go to medical school, when I decided it just wasn’t for me.
Instead, I went to graduate school for computer science, and AI in particular. I decided that was what I was passionate about, and that’s what I wanted to study. So, I did my master’s at Rensselaer Polytechnic Institute (RPI) in New York, where I developed a semantic network system as part of my thesis. I called it COSMOS, which I am sure stood for something related to cognition and sounded cool, but I can’t remember the precise expansion. COSMOS represented knowledge and language, and could perform limited forms of logical reasoning.
I was giving a presentation of COSMOS at a sort of industrial science fair at RPI in 1985 when some folks from the IBM Watson Research Center, who had just started their own AI project, saw me presenting and they asked me if I wanted a job. My original plan had been to stay on and get my PhD, but a few years before this I’d seen an ad in a magazine to become an IBM Research Fellow where you could research whatever you want with unlimited resources—that sounded like my dream job, so I’d cut that ad out and pinned it on my bulletin board. When these people from IBM’s Research Center offered me that job, I took it.
So, in 1985 I started working on an AI project at IBM Research, but then a couple of years later, the 1980s’ AI winter had hit, and IBM was going around canceling every project that was associated with AI. I was told that they would be able to put me to work on other projects, but I didn’t want to work on other projects, I wanted to work on AI, so I decided to quit IBM. My dad was mad at me. He was already pissed I didn’t become a doctor, then by some miracle I had gotten a good job anyway and now I was quitting two years later. That just did not sound like a good thing to him.
I went back to RPI and did my PhD on non-monotonic reasoning. I designed and built a medical expert system called CARE (Cardiac and Respiratory Expert) and just learned a lot more about AI during that period. To support my studies, I also worked on a government contract building an object-oriented circuit design system at RPI. After completing my PhD, I needed to look for work. My dad had gotten pretty sick and he lived down in Westchester, where IBM was also based. I wanted to be near him, so I called some people I knew from my earlier IBM days and ended up going back to IBM Research.
IBM was not an AI company at that point, but 15 years later, with Watson and other projects, I had helped to shape it in that direction. I never gave up my desire to work on AI, and I built a skilled team over the years and engaged in every opportunity to work in areas like language processing, text and multimedia analytics, and automatic question answering. By the time there was this interest in doing Jeopardy!, I was the only one in IBM who believed it could be done and had a team capable of doing it. With Watson’s huge success, IBM was able to transform itself into an AI company.
MARTIN FORD: I don’t want to focus much on your work with Watson, as that’s already a very well-documented story. I’d like to talk about how you were thinking about AI, after you left IBM.
DAVID FERRUCCI: The way I think about AI is that there’s perception—recognizing things, there’s control—doing things, and there’s knowing—building, developing, and understanding the conceptual models that provide the foundation of communication, and the development of theories and ideas.
One of the interesting things I learned working on the Watson project was that pure statistical approaches were limited in the “understanding” part, that’s their ability to produce casual and consumable explanations for their predictions or their answers. Purely data-driven or statistical approaches to prediction are very powerful for perception tasks, such as pattern recognition, voice recognition, and image recognition, and control tasks, such as driverless cars and robotics, but in the knowledge space AI is struggling.
We’ve seen huge advances in voice and image recognition and in general, perception-related stuff. We’ve also seen huge advances in the control systems that you see driving drones and all kinds of robotic driverless cars. When it comes to fluently communicating with a computer based on what it has read and understood, we’re not even close to there yet.
MARTIN FORD: More recently in 2015 you started a company called Elemental Cognition. Could you tell us more about that?
DAVID FERRUCCI: Elemental Cognition is an AI research venture that’s trying to do real language understanding. It’s trying to deal with that area of AI that we still have not cracked, which is, can we create an AI that reads, dialogs, and builds understanding?
A human being might read books and develop rich models of how the world works in their head, and then reason about it and fluently dialog about it and ask questions about it. We refine and compound our understand
ing through reading and dialoging. At Elemental Cognition, we want our AI to do that.
We want to look beyond the surface structure of language, beyond the patterns that appear in word frequencies, and get at the underlying meaning. From that, we want to be able to build the internal logical models that humans would create and use to reason and communicate. We want to ensure a system that produces a compatible intelligence. That compatible intelligence can autonomously learn and refine its understanding through human interaction, language, dialog, and other related experiences.
Thinking about what knowing and understanding means is a really interesting part of AI. It’s not as easy as providing labeled data for doing image analysis, because what happens is that you and I could read the same thing, but we can come up with very different interpretations. We could argue about what it means to understand that thing. Today’s systems do more text matching and looking at the statistical occurrences of words and phrases, as opposed to developing a layered and logical representation of the complex logic that is really behind the language.
MARTIN FORD: Let’s pause to make sure people grasp the magnitude of this. There are lots of deep learning systems today that can do great pattern recognition and could, for example, find a cat in a picture and tell you there’s a cat in the image. But there is no system in existence that really understands what a cat is, in the way that a person does.
DAVID FERRUCCI: Well yes, but you and I could also argue about what a cat is. That’s the interesting part because it asks what does it mean to actually understand. Think about how much human energy goes into helping each other to develop shared understandings of things. It’s essentially the job of anyone compiling or communicating information, any journalist, artist, manager, or politician. The job is to get other people to understand things the way they understand them. That’s how we as a society can collaborate and advance rapidly.
That’s a difficult problem because in the sciences we’ve developed formal languages that are completely unambiguous for the purposes of producing value. So, engineers use specification languages, while mathematicians and physicists use mathematics to communicate. When we write programs, we have unambiguous formal programming languages. When we talk, though, using natural language, which is where we’re absolutely prolific and where our richest and most nuanced things happen, there it’s very ambiguous and it’s extremely contextual. If I take one sentence out of context, it can mean lots of different things.
It’s not just the context in which the sentence is uttered, it’s also what is in that person’s mind. For you and I to confidently understand each other, it is not enough for me just to say things. You have to ask me questions, and we have to go back and forth and get in sync and align our understandings until we are satisfied that we have a similar model in our heads. That is because the language itself is not the information. The language is a vehicle through which we communicate the model in our heads. That model is independently developed and refined, and then we align them to communicate. This notion of “producing” an understanding is a rich, layered, highly contextual thing that is subjective and collaborative.
A great example was when my daughter was seven years old and doing some school work. She was reading a page in a science book about electricity. The book says that it’s energy that’s created in different ways, such as by water flowing over turbines. It ends by asking my daughter a simple question, “How is the electricity produced?” She looks back at the text, and she’s doing text matching, saying well it says electricity is created and “created” is a synonym of “produced,” and then it has this phrase, “by water flowing over turbines.”
She comes to me and says, “I can answer this question by copying this phrase, but I have no understanding of what electricity is or how it is produced.” She didn’t understand it at all, even though she could get the question right by doing text matching. We then discussed it and she gained a richer understanding. That is more-or-less how most language AI works today—it doesn’t understand. The difference is that my daughter knew she didn’t understand. That is interesting. She expected much more from her underlying logical representation. I took that as a sign of intelligence, but I may be have been biased in this case. Ha!
It’s one thing to look at the words in a passage and take a guess at the answer. It’s another thing to understand something enough to be able to communicate a rich model of your understanding to someone and then discuss, probe, and get in sync to advance your understanding as a result.
MARTIN FORD: You’re imagining a system that has a genuine understanding of concepts and that can converse and explain its reasoning. Isn’t that human-level artificial intelligence or AGI?
DAVID FERRUCCI: When you can produce a system that can autonomously learn, in other words, it can read, understand, and build models then converse, explain, and summarize the models to a person that it’s talking to, then you’re approaching more of what I would call holistic intelligence.
As I said, I think there are three parts to a complete AI, perception, control, and knowing. A lot of the stuff that’s going on with deep learning is remarkable regarding the progress that we’re making on the perception and the control pieces, the real issue is the final piece. How do we do the understanding and the collaborative communication with humans so that we can create a shared intelligence? That’s super powerful, because our main means for building, communicating, and compounding knowledge is through our language and building human-compatible models. That’s the AI that I’m endeavoring to create with Elemental Cognition.
MARTIN FORD: Solving the understanding problem is one of the holy grails of AI. Once you have that, other things fall into place. For example, people talk about transfer learning or the ability to take what you know and apply it in another domain, and true understanding implies that. If you really understand something, you should be able to apply it somewhere else.
DAVID FERRUCCI: That’s exactly right. One of the things that we’re doing at Elemental Cognition is testing how a system understands and compounds the knowledge that it reads in even the simplest stories. If it reads a story about soccer, can it then apply that understanding to what’s going on in a lacrosse game or a basketball game? How does it reuse its concepts? Can it produce analogous understandings and explanations for things, having learned one thing and then doing that reasoning by analogy and explaining it in a similar way?
What’s tricky is that humans do both kinds of reasoning. They do what we might think of as statistical machine learning, where they process a lot of data points and then generalize the pattern and apply it. They produce something akin to a trendline in their head and intuit new answers by applying the trend. They might look at some pattern of values and when asked what is next, intuitively say the answer is 5. When people are doing that, they’re doing more pattern matching and extrapolation. Of course, the generalization might be more complicated than a simple trend line, as it certainly can be with deep learning techniques.
But, when people sit down and say, “Let me explain to you why this makes sense to me—the answer is 5 because...,” now they have more of a logical or causal model that they’ve built up in their head, and that becomes a very different kind of information that is ultimately much more powerful. It’s much more powerful for communication, it’s much more powerful for an explanation, and it’s much more powerful for extension because now I could critique it and say, “Wait, I see where your reasoning is faulty,” as opposed to saying “It’s just my intuition based on past data. Trust me.”
If all I have is inexplicable intuition, then how do I develop, how do I improve, and how do I extend my understanding of the world around me? That’s the interesting dilemma I think we face when we contrast these two kinds of intelligences. One that is focused on building a model that is explicable, that you can inspect, debate, explain, and improve on, and one that says, “I count on it because it’s right more often than it’s wrong.” Both are useful, but they’re v
ery different. Can you imagine a world where we give up agency to machines that cannot explain their reasoning? That sounds bad to me. Would you like to give agency up to humans that cannot explain their reasoning?
MARTIN FORD: Many people believe that deep learning, that second model that you describe, is enough to take us forward. It sounds like you think we also need other approaches.
DAVID FERRUCCI: I’m not a fanatic one way or the other. Deep learning and neural networks are powerful because they can find nonlinear, very complex functions in large volumes of data. By function, I mean if I want to predict your weight given your height, that could be a very simple function represented by a line. Predicting the weather is less likely to be represented by a simple linear relationship. The behavior of more complex systems is more likely represented by very complex functions over many variables (think curvy and even discontinuous and in many dimensions).
You can give a deep learning system huge amounts of raw data and have it find a complex function, but in the end, you’re still just learning a function. You might further argue that every form of intelligence is essentially learning a function. But unless you endeavor to learn the function that outputs human intelligence itself (what would be the data for that?), then your system may very well produce answers whose reasons are inexplicable.
Imagine I have a machine called a neural network where if I load in enough data, it could find an arbitrarily complex function to map the input to the output. You would think, “Wow! Is there any problem it can’t solve?” Maybe not, but now the issue becomes, do you have enough data to completely represent the phenomenon over all time? When we talk about knowing or understanding, we have first to say, what’s the phenomenon?