For humans, distinguishing President George Washington from the bridge named after him wasn’t much of a challenge. Context made it clear. Bridges didn’t deliver inaugural addresses; presidents were rarely jammed at rush hour, with half-hour delays from New Jersey. What’s more, when placed in sentences, people usually behaved differently than roads or bridges.
But what was simple for us involved hard work for a Q-A computer. It had to comb through the structure of the question, picking out the subjects, objects, and prepositions. Then it had to consult exhaustive reference lists that had been built up in the industry over decades, laying out hundreds of thousands of places, things, and actions and the web of relationships among them. These were known as “ontologies.” Think of them as cheat sheets for computers. If a finger was a subject, for example, it fell into human anatomy and was related to the hand and the thumb and to verbs such as “to point” and “to pluck.” (Conversely, when “the finger” turned up as the object of the verb “to give,” a sophisticated ontology might steer the computer toward the neighborhood of insults, gestures, and obscenities.)
In any case, Fan needed both a type system and a knowledge base to understand questions and hunt for answers. He didn’t have either, so he took a hacker’s shortcut and used Google and Wikipedia. (While the true Jeopardy computer would have to store its knowledge in its “head,” prototypes like Fan’s were free to search the Web.) From time to time, Fan found, if he typed a clue into Google, it led him to a Wikipedia page—and the subject of the page turned out to be the answer. The following clue, for example, would confound even the most linguistically adept computer. In the category The Author Twitters, it reads: “Czech out my short story ‘A Hunger Artist’! Tweet done. Max Brod, pls burn my laptop.” A good human Jeopardy player would see past the crazy syntax, quickly recognizing the short story as one written by Franz Kafka, along with a reference to Kafka’s Czech nationality and his longtime associate Max Brod.
In the same way, a search engine would zero in on those helpful key words and pay scant attention to the sentence surrounding them. When Fan typed the clue into Google, the first Wikipedia page that popped up was “Franz Kafka,” the correct answer. This was a primitive method. And Fan knew that a computer relying on it would botch the great majority of Jeopardy clues. It would be crashing and burning in the game against even ignorant humans, let alone Ken Jennings. But one or two times out of ten, it worked. For Fan, it was a start.
The month passed. Fan added more features to Basement Baseline. But at the end, the system was still missing vital components. Most important, it had no mechanism for gauging its level of confidence in its answers. “I didn’t have time to build one,” Fan said. This meant the computer didn’t know what it knew. In a game, it wouldn’t have any idea when to buzz. Fan could conceivably have programmed it with simple rules. It could be instructed to buzz all the time—a serious money loser, considering it flubbed two clues for every one it got right. Or he could have programmed it to buzz in every category in which it got the first clue right. That would signal that it was oriented to the category. But his machine didn’t have any way to learn that its response was right or wrong. It lacked a feedback loop. In the end, Fan blew off game strategy entirely and focused simply on building a machine that could answer Jeopardy clues.
It soon became clear that the bake-off, beyond a test of technologies, also amounted to a theater production staged by David Ferrucci. It was tied to inside politics. Ferrucci didn’t believe that the Piquant platform could ever be adapted to Jeopardy. It wasn’t big or robust enough. Yet there were expectations within the company that Piquant, which represented more than twenty researcher years, would play an important role. To build the far bigger machine he envisioned, Ferrucci needed to free himself, and the project, from the old guard’s legacy. For this, Piquant had to fail. He didn’t spell this out. But he certainly didn’t give the team the guidance, or the time, to overhaul the system. So besides training the machine on five hundred Jeopardy clues and teaching it to answer them in the form of questions, the Piquant team left the system largely unchanged. “You could have guessed from the outset that the success rate was not going to be very high,” said Jennifer Chu-Carroll, a member of the team. Piquant was being led to a public execution.
The bake-off took place on a March morning at the Hawthorne lab. The results, from Ferrucci’s perspective, were ideal. The Piquant system succeeded on only 30 percent of the clues, far below the level needed for Jeopardy. It had high confidence on only 5 percent of them, and of those it got only 47 percent right. Fan’s Basement Baseline fared almost as well by a number of measures but was still woefully short of what was needed. Fan proved that a hacker’s concoction was far from Jeopardy standards—which was a relief. But by nearly matching the company’s state-of-the-art in Q-A technology, he highlighted its inadequacies.
The Jeopardy challenge, it was clear, would require another program, another technology platform, and a far bolder approach. Ferrucci wouldn’t hesitate to lift algorithms and ideas from both Piquant and Basement Baseline, but the project demanded far more than a recasting of IBM technologies. It was too big for a single company, even one as burly as IBM. The Blue J machine, Ferrucci said, would need “the most sophisticated intelligence architecture the world has ever seen.” For this, the Jeopardy team would have to reach out to the universities doing the most exciting work in AI, including MIT, Carnegie Mellon, and the University of Texas. “We needed all the brains we could get behind this project,” he said.
Back in the late ’70s, when he was commuting from the Bronx to his high school in suburban New Rochelle, Ferrucci and his best friend at the time, Tony Marciano, had an idea for a new type of machine. They called it “a reverse-dictionary.” The idea, Ferrucci said, was to build a machine that could find elusive words. “You know how it is when you want to express something, but you can’t think of the right word for it? A dictionary doesn’t help at all, because you don’t know what to look up. We wanted to build the machine that would give you the word.” This was before they’d ever seen a computer. “We were thinking of a mechanical thing.”
It sounded like a thesaurus. But Ferrucci bridled a bit at the suggestion that his dream machine had existed for centuries as a book. “No, you don’t give it the synonyms, just the definition,” he said. “Basically we were scratching this idea that the computer could understand your meaning, your words, your definitions, and could come up with the word.”
Ferrucci was a hot shot at science at Iona Grammar School, a Catholic boys school. He and Marciano—who, according to Ferrucci, “did calculus on his cuff links”—regarded even their own brains as machines. Marciano, for example, had the idea that devoting brain space to memory storage was wasteful. It distracted neurons from the more important work of processing ideas. So when people asked him questions requiring recall, he would respond, “Ask Dave. He’s willing to use memory.”
Ferrucci’s father, Antonio, had come to the United States from Italy after the Second World War. He had studied some law and engineering in the dying days of Mussolini’s regime, but he arrived in New York without a profession and ended up driving trucks and working in construction. He and Ferrucci’s mother, Connie, wanted their son to be a doctor. One summer during high school, Ferrucci had planned just to hang around with his friends and “play.” His father wouldn’t stand for it. “He’d gotten something in the mail about a math and computer course at Iona College. He says, ‘You’ve got the grades, why don’t you sign up for that?’”
At Iona, Ferrucci came face-to-face with his first computer. It featured a hulking cathode ray tube with a black screen and processed data encoded on teletype. He fell for it immediately. “Here was a machine,” he said. “You told it to do stuff, and it did what you told it. I thought, ‘This is big.’ I called up Tony Marciano, and I said, ‘You get your butt over here, into this room at Iona College. You’ve got to see this machine.’”
Marciano, who la
ter studied computer science and went on to become a finance professor at New York University’s Stern Business School, met Ferrucci later that afternoon. The two of them stayed long into the evening, paging through a single manual, trying out programs on the computer and getting the machine to spit out differential equations. At that point, Ferrucci knew that he wanted to work with computers. However, he didn’t consider it a stand-alone career. A computer was a tool, as he saw it, not a destination. Anyway, he was going to be a doctor.
He went on to Manhattan College, a small Catholic school that was actually in The Bronx, a few miles north of Manhattan. There he followed the pre-med track as a biology major and took computer science on the side. “I did a bunch of programming for the physiology lab,” he said. “Everything I did in biology I kept relating to computers.” The way technology was advancing, it seemed, there had to be a place for computers in medicine.
One night, Ferrucci was taking a practice exam in a course for the MCATs, the Medical College Admission Tests. “I was with all my pre-med friends,” he said. “This is midway through the course. The proctor says, ‘Open to page 10 and start taking the sample chemistry test.’ I opened it up and I started doing the questions, and all of a sudden I said, ‘You know what? I’m not going to be a doctor!’ And I closed the test and I went up to the proctor and I said, ‘I’m quitting. I don’t want to be a doctor.’ He said, ‘You’re not going to get your $500 back.’ I said, ‘Whatever.’”
Ferrucci left the building and made two phone calls. He dialed the easier one first, telling his girlfriend that he’d just walked out of the MCAT class and was giving up on medicine. Then he called his father. “That was a hard call to make,” Ferrucci said. “He was very upset in the beginning.”
His MCAT insight, while steering him away from medicine, didn’t put him on another clear path. He still didn’t know what to do. “I started looking for graduate programs in physiology that had a strong computing component,” he said. “After about a week or two of that, I suddenly said, ‘Wait a minute.’” He called this his “second-level epiphany.” He asked himself why he was avoiding the obvious. “I was really interested in the computer stuff,” he said, “not the physiology. So I’d have to make a complete break.” He applied to graduate school in computer science and went upstate, to Rensselaer Polytechnic Institute (RPI), in Troy, New York.
In his first stint at IBM Research, between getting his master’s and his doctorate at RPI, Ferrucci delved into AI. By that time, in the late ’80s, the industry had split into two factions. While some scientists still pursued the initial goal of thinking machines, or general intelligence, others looked for more focused applications that could handle real jobs (and justify the research). The king of “narrow AI,” and Ferrucci’s focus, was the expert system. The idea was to develop smart software for a specific industry. A program designed, say, for the travel industry could answer questions about Disneyland or Paris, find cheap flights, and book hotels. These specialists wouldn’t have to puzzle out the context of people’s conversations. The focus of their domains would make it clear. For that electronic expert in travel, for example, “room” would mean only one thing. The computer wouldn’t have to concern itself with “room” in the backseat of a Cadillac or “room” to explore in the undergraduate curriculum at Bryn Mawr. If it were asked about such things, it would draw a blank. Computers that lacked range and flexibility were known as brittle. The one-trick ponies seen as expert systems almost defined the term. Many in the industry didn’t consider them AI at all. They certainly didn’t think or act like people.
To build a more ambitious-thinking machine, some looked to the architecture of the human brain. Indeed, while Ferrucci was grappling with expert systems, other researchers were piecing together an altogether different species of program, called “neural networks.” The idea had been bouncing around at least since 1948, when Alan Turing outlined it in a paper called “Intelligent Machinery.” Like much of his thinking, Turing’s paper was largely theoretical. Computers in his day, with vacuum tubes switching the current on and off, were too primitive to handle such work. (He died in 1954, the year that Texas Instruments produced the first silicon transistor.) However, by the ’80s, computers were up to the job. Based on rudimentary models of neurons, these networks analyzed the behavior of complex systems, such as financial markets and global weather, and used statistical analysis to predict how they would behave over time.
A neural network functioned a bit like a chorus. Picture a sing-along concert of Handel’s Messiah in Carnegie Hall. Some five thousand people show up, each one wearing a microphone. You play the music over loudspeakers and distribute musical scores. That’s the data input. Most of the people start singing while others merely hum or chat with their neighbors. In a neural net, the learning algorithm picks out the neurons that appear to be replicating the pattern, and it gives them more sway. This would be like turning up the microphones of the people who are singing well, turning down the mikes of those who sing a tad off key—and shutting out the chatterers altogether. The net focuses not only on the individuals but on the connections among them. In this analogy, perhaps the singers start to pay attention to one another and organize, the tenors in one section, sopranos in another. By the end of a long training process, the Carnegie Hall network both interprets the data and develops an expertise in Handel’s motifs and musical structure. The next week, when the music switches to Gershwin, new patterns emerge. Some of the chatterers, whose mikes were turned off, become stars. With time, this assemblage can identify new pieces of music, recognizing similar themes and variations. And the group might even set off an alarm if the director gets confused and starts playing Vivaldi instead of Handel.
Neural networks learned, and even evolved. In that sense, they crudely mimicked the human brain. People driving cars, for example, grow to respond to different patterns—the movement of traffic, the interplay between the wheel and the accelerator—often without thinking. These flows are reflected by neural connections in the brain, lots of them working in parallel. They’re reinforced every time an experience proves their usefulness. But a change, perhaps a glimpse of a cyclist riding against traffic, snaps them from their reverie. In much the same way, neural networks became very good at spotting anomalies. Credit card companies began to use them to note unexpected behavior—an apparent teetotaler buying $500 of Finnish vodka or a frugal Nebraskan renting luxury suites in Singapore. Various industries, meanwhile, used neural networks to look ahead. As long as the future stayed true to the past—not always a safe assumption, as any mortgage banker can attest—they could make solid predictions.
Unlike the brittle expert systems, neural networks were supple. They specialized in pattern detection, not a series of if/then commands. They never choked on changes in the data but simply adjusted. While expert systems processed data sequentially, as if following a recipe, the electronic neurons crunched in unison—in parallel. Their weakness? Since these collections of artificial neurons learned by themselves, it was nearly impossible to figure out how they reached their conclusions or to understand what they were picking up about the world. A neural net was a black box.
By the time Ferrucci returned to IBM Research, in 1995, he was looking beyond expert systems and neural nets. In his spare time, he and a colleague from RPI, Selmer Bringsjord, were building a machine called Brutus, which wrote fiction. And they were writing a book about their machine, Artificial Intelligence and Literary Creativity. Brutus, they wrote, is “utterly devoid of emotion, but he nonetheless seems to have within his reach things that touch not only our minds, but our heart.”
The idea for the program, Ferrucci later said, came when Bringsjord asked him if a machine could create its own story line. Ferrucci took up the challenge. Instead of teaching the machine to dream up plots, he programmed it with about a dozen themes, from betrayal to revenge. For each theme, the machine was first given a series of literary examples and then a program to develop stories along
those lines. One of its models for betrayal was Shakespeare’s Julius Caesar (the program was named for Caesar’s confidant-turned-conspirer, Brutus). The program produced serviceable plots, but they were less than riveting. “The one thing it couldn’t do is figure out if something was interesting,” Ferrucci said. “Machines don’t understand that.”
In his day job, Ferrucci was teaching computers more practical lessons. As head of Semantic Analysis and Integration at IBM, he was trying to instruct them to make sense of human communication. On the Internet, records of our words and activities were proliferating as never before. Companies—IBM and its customers alike—needed tools to interpret these new streams of information and put them to work. Ideally, an IBM program would tell a manager what customers or employees were saying or thinking as well as what trends and insights to draw from them and perhaps what decisions to make.
Within IBM itself, some two hundred researchers were developing a host of technologies to mine what humans were writing and saying. But each one operated within its own specialty. Some parsed sentences, analyzing the grammar and vocabulary. Others hunted Google-style for keywords and Web links. Some constructed massive databases and ontologies to organize this knowledge. A number of them continued to hone expert systems and neural networks. Meanwhile, the members of the Q-A team coached their computer for the annual TRec competitions. “We had lots of different pockets of researchers working on these different analytical algorithms,” Ferrucci said. “But any time you wanted to combine them, you had a problem.” There was simply no good way to do it.
In the early 2000s, Ferrucci and his team put together a system to unify these diverse technologies. It was called UIMA, Unstructured Information Management Architecture. It was tempting to think of UIMA as a single brain and all of the different specialties, from semantic analysis to fact-checking, as cognitive regions. But Ferrucci maintained that UIMA had no intelligence of its own. “It was just plumbing,” he said. Idle plumbing, in fact, because for years it went largely unused.
Final Jeopardy Page 7