As they worked to lift Watson’s performance, the Jeopardy team focused on entire categories that the machine misunderstood. They called them train wrecks. It was a new genre, conceived after Watson’s debacle against Lindsay. The most insidious train wrecks, Gondek said one afternoon, were those in which Watson was fooled into “trusting” its expertise—generating high confidence scores—in categories where it in fact had no clue. This double ignorance could lead it to lay costly bets, embarrassing the team and losing the match.
Lots of the train wreck categories raised questions about the roots of Watson’s misunderstandings. One category, for example, that appeared to confuse it was Books in Español. Watson didn’t come close to identifying Ernest Hemingway’s Adios a las Armas, Harper Lee’s Matar un Ruiseñor, or Stephen King’s La Milla Verde. It already held rudimentary foreign words and phrases in its tool kit. But would it benefit from greater detail? As it turned out, Watson’s primitive Spanish wasn’t the problem. The issue was simpler than that. From the name of the category and the bare-bones phrasing of the clues—Stephanie Meyer: Luna Nueva—the computer did not know what to look for. And unlike human contestants, it was deaf to the correct answers. If IBM and Jeopardy ironed out an arrangement to provide Watson with the answers after each clue, it might orient itself in puzzling categories. That way, it could move on to the real challenge of the clue, recognizing titles like To Kill a Mockingbird and A Farewell to Arms in Spanish.
As the season of sparring sessions progressed, people in the observation room paid less attention to the matches as they were being played. They talked more and looked up at the big monitor when they heard laughter or when Watson found itself in a tight match. The patterns of the machine were becoming familiar. For them, much of the excitement came a day later, when they began to analyze the data and saw how the smarter version of Watson handled the troublesome clues. Ferrucci occasionally used the time during the matches to explain Watson’s workings to visitors, or to give interviews. One March morning, he could be heard across the room talking to a documentary producer. Asked if he would be traveling to California for the televised final match, Ferrucci deadpanned: “I’ll be sedated.”
David Gondek, sitting across from Ferrucci, his fingers on his laptop keyboard, said that pressure in the War Room was mounting. He had largely abandoned his commute from Brooklyn and now spent nights in a small apartment he’d rented nearby. It was only ten minutes by bike to the War Room or a half hour to pedal to the Yorktown labs, where the sparring sessions took place.
From the very beginning, Gondek said, the Jeopardy challenge differed from a typical software project. Usually, software developers are given a list of functions and applications to build. And when they finish them, test, tweak, and debug them, they’re done. Building Watson, however, never ended, he said. There was always something it failed to understand. The work, he said, “is infinite.”
In graduate school, Gondek had focused on data mining. His thesis, on nonredundant clustering, involved programming machines to organize clusters of data around connections that the users might not have considered. By answering some preliminary questions, for example, an intelligence officer might inform the system that he’s all too familiar with Osama bin Laden’s connections to terrorism. So the system, when sorting through a batch of intelligence documents, would find other threads and connections, perhaps leading to fresh insights about the Al Qaeda leader. Machines, much like humans, follow conventional patterns of analysis. Gondek had been thinking about this since listening to a recent talk by a cognitive psychologist. It raised this question: If a machine like Watson fell into the same mental traps as humans, was it a sign of intelligence or just a cluelessness that it happened to share with us? He provided an example.
“What color is snow?” he asked.
“White,” I said.
“A wedding dress?”
“White.”
“Puffy clouds?”
“White.”
“What do cows drink?”
“Milk,” I said, falling obediently into the trap he’d set.
Cows, of course, drink water once they’re weaned. Because humans naturally seek patterns and associations, most of us get into a “white” frame of mind. Psychologists call this the associative network theory. One node in our mind represents “cow,” said Penn State’s Richard Carlson. “It’s related to others, for milk and steak and mooing, and so on.” The mention of “cow,” he said, activates the entire network, priming it. “That way, you’re going to be quicker to respond.”
Gondek’s point was that Watson, unlike most question-answering programs, would fall for the same trick. It focused on patterns and correlations and had a statistical version of an associative network. It was susceptible to being primed for “white.” It was like a human in that narrow way.
University researchers in psychology and computational neuroscience are building computer models to probe these similarities. At Carnegie Mellon, a team under John Anderson, a psychology professor, has come up with a cognitive architecture called ACT-R that simulates human thought processes. Like Watson, it’s a massively parallel system fueled by statistical analysis.
Yet the IBM team resolutely avoided comparisons between Watson’s design and that of a brain. Any claims of higher intelligence on the part of their machine, they knew, would provoke a storm of criticism from psychologists and the AI community alike. It was true that on occasion Watson and the human brain appeared to follow similar patterns. But that, said Gondek, was only because they were programmed, each in its own way, to handle the same job.
A few months later, Greg Lindsay was eating sushi in a small Japanese restaurant near his apartment in Brooklyn Heights. He wore wire-rimmed glasses, and his thinning hair was cut so short that it stood straight up. He had to eat quickly. A book editor was waiting for the manuscript fixes on his book, Aerotropolis. It was about the rise of cities built around airports, and it fit his insatiable hunger for facts. He said he had had to delve deeply into transportation, energy, global manufacturing, and economics. Little surprise, then, that the book was nearly five hundred pages long.
Lindsay said he had assumed that Watson would maul him in the sparring rounds, beating him to the buzzer every time. This hadn’t happened. By anticipating the end of Todd Crain’s delivery of the clue, he had managed to outbuzz Watson a number of times. He also thought that the extra time to answer Final Jeopardy would give Watson plenty of opportunity to find the right answer. This clearly was not the case. In fact, this extra time raised questions among Ferrucci’s team. To date, Watson was answering every question the same way, as if it had the usual three to five seconds, even when it had five or six times as long. That meant that it was forgoing precious time that it could be spending hunting and evaluating potential answers. Would that extra time help? Just a few days before, Gondek had said that he wasn’t sure. With more time, he said, “Watson might bring more wrong answers to the surface and undermine its confidence in the right one.” In other words, the computer ran the risk of overthinking. In the coming months, Gondek and his colleagues thought they might test a couple of other approaches, but they were starting to run out of time.
For his strategy against Watson, Lindsay said, he took a page out of the William Goldman novel The Princess Bride. The hero in the story is facing a fight with a much better swordsman, so he contrives to move the fight to a stony surface, where the rival might slip. In the same way, Lindsay steered Watson to an equally unstable arena: “areas of semantic complexity.” He predicted that humans playing Watson in the television showdown would follow the same strategy.
But there was one big difference. With a million dollars at stake, the humans would not only be battling Watson, they’d also be competing against each other. This could change the dynamics dramatically. In the sparring sessions, the humans (playing with funny money) focused exclusively on the machine. “I didn’t care about the others; I just wanted to beat Watson,” Lindsay said. But
as the two humans in the upcoming match probed each other’s weaknesses and raced to buzz peremptorily, they could open the door for the third contestant, who would be oblivious to the drama and would go about its business, no doubt, with the unflappable dispatch of a machine.
7. AI
ON A MIDSUMMER afternoon in 2010, a cognitive scientist at MIT named Joshua Tenenbaum took a few minutes to explain why the human brain was superior to a question-answering machine like Watson. He used the most convenient specimen of human cognition at hand, his own mind, to make his case. Tenenbaum, a youthful professor with sandy hair falling across his forehead and an easy smile, has an office in MIT’s imposing headquarters for research on brains, memory, and cognitive science. His window looks across the street at the cascading metallic curves of MIT’s Stata Center, designed by the architect Frank Gehry.
Tenenbaum is focusing his research on the computational basis of human learning and trying to replicate it with machines. His goal is to come up with computers whose intelligence reaches far beyond answering questions or finding correlations in masses of data. One day, he hopes, the systems he’s working on will come up with concepts and theories, the way humans do, sometimes basing them on just a handful of observations. They would make what he called inductive leaps, behaving more like Charles Darwin than, say, Google’s search engine or Watson. Darwin’s data—his studies of worms, pigeons, and a host of other plants and animals—was tiny by today’s standards; they would occupy no more than a few megabytes on a hard drive. Yet he came up with a theory that explained the evolution of life on earth. Could a computer do that?
Tenenbaum was working toward that distant vision, but for the moment his objective was more modest. He thought Watson acted smarter than it was, and he wanted to demonstrate why. He had recently read in a magazine about Watson’s mastery of Jeopardy’s Before and After clues, the ones that linked two concepts or people with a shared word in the middle. When asked about a candy bar that was a Supreme Court justice, Watson had quickly come up with “Who is Baby Ruth Ginsburg.”
Now Tenenbaum was creating a Before and After clue of his own. “How about this one?” he said. “A president who wrote a founding document and later led a rebellion against it.” The answer, a combination of the third president of the United States and the only president of the Confederacy: Thomas Jefferson Davis.
Tenenbaum’s point was that it took a team of gifted engineers to teach Watson how to handle these questions by devising clever algorithms. But humans, after seeing a single example of a Before and After clue, could build on it, not only figuring out how to respond to such questions but inventing new ones. “I know who Ruth Ginsburg is and I know what Baby Ruth is and I see how they overlap, and from that one example I can extract that template,” he said. “I don’t have to be programmed with that question.” We humans, he explained, create our own algorithms on the fly.
As in many fields of science, researchers in Artificial Intelligence have long fallen into two groups, pragmatists and visionaries. And most of the visionaries, including Tenenbaum, argue that machines like Watson merely simulate intelligence by racing through billions of correlations. Watson and its kin don’t really “know” or “understand” anything. Watson can ace Jeopardy clues on Shakespeare, but only because the ones and zeros that spell out “Shakespeare” pop up on lists and documents near other strings of ones and zeros representing playwrights, England, Hamlet, Elizabethan, and so on. It lacks anything resembling awareness. Most reject the suggestion that the clusters of data nestled among its transistors mirror the memories encoded chemically in the human brain or that Watson’s search for Jeopardy answers, and its statistical methods of balancing one candidate answer with another, mimic what goes on in Ken Jennings’s head.
The parallels, Tenenbaum said, are deceiving. Watson, for example, appears to learn. But its learning comes from adjusting its judgments to feedback, moving toward the combinations that produce correct answers and away from errors. These “error-driven learning algorithms,” he said, are derived from experiments in behavioral psychology. “The animals do something, and they’re rewarded or they’re punished,” he said. That kind of learning may be crucial to survival, leading humans and many animals alike to recoil from flames, coiled snakes, and bitter, potentially poisonous, berries. But this describes a primitive level of brain function. What’s more, Watson’s learning laboratory was limited, extending only to its 75 gigabytes of data and the instructions of its algorithms. Outside that universe, Tenenbaum stressed, Watson knew nothing. And it formed no theories.
Ferrucci didn’t disagree. Watson had its limitations. One time, when Ferrucci learned that another scientist had disparaged Watson as an “idiot savant,” he said, “Idiot savant? I’ll take it!” While he objected to that term, which he viewed as demeaning, Ferrucci said he only wished that Watson could approach the question-answering mastery of humans like Kim Peek, the model for the so-called megasavant played by Dustin Hoffman in the movie Rainman. Peek, who died in 2009, was a walking encyclopedia. He had read voluminously and seemed to recall every detail with precision. Yet he had grave physical and developmental shortcomings. His brain was missing the corpus callosum, the bundle of nerves connecting the two hemispheres. He had little meaningful interaction with people—with the exception of his father—and he did not appear to draw sophisticated conclusions from his facts, much less come up with theories. He was a stunted genius. But unlike Watson, he was entirely fluent in language. As far as Ferrucci was concerned, a Q-A machine with the language proficiency of a human was a dream. It would have boundless market potential. He would leave it to other kinds of machines to come up with theories.
The question was whether computers like Watson, products of this pragmatic, problem-solving (and profit-seeking) side of the AI world, were on a path toward higher intelligence. Within a decade, computers would likely run five hundred times as fast and would race through databases a thousand times as large. Within fifteen years, studies predicted that a single supercomputer would be able to carry out 1020 calculations per second. This was enough computing power to count every grain of sand on earth in a single second (assuming it didn’t have more interesting work to do). At the same time, the algorithms running such machines, each one resulting from decades of rigorous Darwinian sifting, would be smarter and more precise. Would these supercharged descendants of Watson still be in the business of simulating intelligence? Or could they make the leap to a human level, then advance beyond?
The AI community was full of doubters. And their concerns about the limitations of statistical crunchers like Watson stirred plenty of debate within the scientific community. Going back decades, the sparkling vision of AI was to develop machines that could think, know, and learn. Watson, many argued, landed its star spot on national television without accomplishing any of those goals. A human answering a Jeopardy question draws on “layers and layers of knowledge,” said MIT’s Sajit Rao. “There’s so much knowledge around every single word.” Watson couldn’t compare. “If you ask Watson what time it is,” wrote one computer scientist in an e-mail, “it won’t have an answer.”
If Watson hadn’t been so big, few would have cared. But the size and scope of the project, and the razzmatazz surrounding it, fueled resentment. Big Blue was a leading force in AI, and its focus on Jeopardy funneled more research dollars toward its statistical approach. What’s more, Watson was sure to hog the press. The publicity leading up to the man-machine Jeopardy showdown would likely shine a brighter spotlight on AI than anything since the 1997 chess matches between Garry Kasparov and Deep Blue. Yet the world would see, and perhaps fall in love with, a machine that only simulated intelligence. In many aspects, it was dumb. And despite its mastery of statistics, it knew nothing. Worse, if Watson—despite these drawbacks—proved to be an effective and versatile knowledge machine, it might spell the end of competing technologies, turning years of research—entire careers—into dead ends. The final irony: At least a few scie
ntists kept their criticism of Watson private for fear of alienating Big Blue, a potential sponsor of their research.
In sum, from a skeptic’s view, the machine was too dumb, too ignorant, too famous, and too rich. (In that sense, IBM’s computer resembled lots of other television stars. And, interestingly enough, the resentment within the field mirrored the combination of envy and contempt that serious actors feel for the celebrities on reality TV.)
These shortcomings aside, Watson had one quality that few could ignore. In the broad realm of Jeopardy, it worked. It made sense of most of the clues, even those in complex English, and it came up with answers within a few seconds. The question was whether other lines of research in AI would surpass it—or perhaps one day endow a machine with the human smarts or expertise that it lacked.
Dividing the pragmatists like Ferrucci and the idealists within AI was the human brain. For many, including Tenenbaum, the path toward true machine intelligence had less to do with the power of the computer than the nature of its instructions and architecture. Only the brain, they believed, held the keys to higher levels of thinking—to concepts, ideas, and theories. But they were tangled up in the most complex circuitry known in the universe.
Tenenbaum compared the effort required to build theorizing and idea-spouting machines with the American push, a half century earlier, to send a manned voyage to the moon. The moon shot, he said, was far easier. When President Kennedy issued his call for a lunar mission in May 1961, most of the basic scientific research had already been accomplished. Indeed, the march toward space travel had begun early in the seventeenth century, when Galileo started to write down the mathematical equations describing how certain objects moved. This advanced through the Scientific and Industrial Revolutions, from the physics of Newton to the harnessing of electricity, the development of chemical bonds and powerful fuels, the creation of metal alloys, and, finally, advances in rocket technology. By the 1960s, the basic science behind sending a spaceship to the moon was largely complete. Much of the technology existed. It was up to the engineers to assemble the pieces, build them to the proper scale, and send the finished spacecraft skyward.
Final Jeopardy Page 14