“At that point it was over,” Ferrucci said. “We all knew it.” The machine had triumphed. In the few clues that were left, Rutter and Jennings carried out a battle for second place. In the end, as the computer and the two humans revealed their Final Jeopardy responses to a clue about the author of Dracula, Bram Stoker, Jennings added a postscript on his card: “I, for one, welcome our new computer overlord.”
Watson, despite a few embarrassing gaffes, appeared to be just that, at least in the domain of Jeopardy. It dominated both halves of the double match, reaching a total of $77,147. Jennings finished a distant second, with $24,000, just ahead of Rutter, with $21,600.
The audience filed out of the auditorium. Nighttime had fallen. The lobby, its massive Saarinen windows looking out on snow-blanketed fields, was now decked out for a feast. Waiters circulated with beer and wine, shrimp cocktails, miniature enchiladas, and tiny stalks of asparagus wrapped in steak. The home team had won and the celebration was on, with one caveat: Everyone in the festive crowd was sworn to secrecy until the televised match a month later.
Two days later, Alex Trebek was back home in Los Angeles’ San Fernando Valley. He was unhappy about the exhibition match. His chief complaint was that IBM had unveiled one version of Watson for the practice rounds and then tweaked its strategy for the match. “I think that was wrong of IBM,” he said. “It really pissed me off.” For Trebek, the change was tantamount to upping a car’s octane before a NASCAR race. “IBM didn’t need to do that,” he said. “They probably would have won anyway. But they were scared.” He added that after the match was over, “I felt bad for the guys, because I felt they had been jobbed just a little.” Jennings, while disappointed, said he also had masked certain aspects of his strategy during the practice games and didn’t see why Watson couldn’t do the same. Rutter said that “some gamesmanship was going on. But there’s nothing illegal about that.”
Ferrucci, for his part, said that during practice sessions his team was focused on the technical details of Watson’s operations, making sure, for example, that it was getting the electronic feed after each clue of the correct response. Jennings and Rutter, he said, had already seen Watson hunting for Daily Doubles in the videos of the sparring rounds that they’d received months earlier. “Every respectable Jeopardy player knows how to hunt for them,” he added. Was Watson supposed to play dumb?
Fourteen years earlier, Garry Kasparov had registered a complaint similar to Trebek’s after succumbing to Deep Blue in his epic chess match. He objected to the adjustments that the IBM engineers had made to the program in response to what they were learning about his style of play. These disagreements were rooted in questions about the role of human beings in man-machine matches. It was clear that Watson and Deep Blue were on their own as they played. But did they also need to map out their own game strategies? Was that part of the Grand Challenge? IBM in both cases would say no. Jennings and Rutter, on that Friday afternoon in Yorktown Heights, were in fact playing against an entire team of IBM researchers, and the collective intelligence of those twenty-five Ph.D.s was represented on the stage by a machine.
In that sense, it almost seemed unfair. It certainly did to Trebek, who also complained about Watson’s blazing speed and precision on the buzzer. But consider the history. Only three years earlier, Blue J—as Watson was then known—fared worse on Jeopardy clues than an average twelve-year-old. And no one back then would have thought to complain about its buzzer reflexes, not when the machine struggled for nearly two hours to respond to a single clue. Since then, the engineers had led their computer up a spectacular learning curve—to the point where the former dullard now appeared to have an unfair advantage.
And yet Watson, for all its virtues, was still flawed. Its victory was no sure bet. Through the fall, it lost nearly one of three matches to players a notch or two below Jennings and Rutter. A couple of train wreck categories in the final game could have spelled defeat. Even late in the second game, Jennings could have stormed back. If he had won that last Daily Double, Trebek said, “he could have put significant pressure on Watson.” After the match, Jennings and Rutter stressed that the computer still had cognitive catching up to do. They both agreed that if Jeopardy had been a written test—a measure of knowledge, not speed—they both would have outperformed Watson. “It was its buzzer that killed us,” Rutter said.
Looking back, it was fortunate for IBM that Jeopardy had insisted on building a finger for Watson so that it could press the physical buzzer. This demand ten months earlier had initially irked Ferrucci, who worried that Jeopardy’s executives would continue to call for changes in their push for a balanced match. But if Watson had beaten Jennings and Rutter to the buzz with its original (and faster) electronic signal, the match certainly would have been widely viewed as unfair— just as Harry Friedman and his team had warned all along.
Still, despite Watson’s virtuosity with the buzzer and its remarkable performance on Jeopardy clues, the machine’s education is far from complete. As this question-answering technology expands from its quiz show roots into the rest of our lives, engineers at IBM and elsewhere must sharpen its understanding of contextual language. And they will. Smarter machines will not call Toronto a U.S. city, and they will recognize the word “missing” as the salient fact in any discussion of George Eyser’s leg. Watson represents merely a step in the development of smart machines. Its answering prowess, so formidable on a winter afternoon in 2011, will no doubt seem quaint in a surprisingly short time.
Two months before the match, Ken Jennings sat in the empty Wheel of Fortune studio on the Sony lot, thinking about a world teeming with ever-smarter computers. “It does make me a little sad that a know-it-all like me is not the public utility that he used to be,” he said. “There used to be a guy in every office, and everyone would know which cubicle you would go to find out things. ‘What’s the name of the bassist in that band again?’ Or ‘What’s the movie where … ?’ Or ‘Who’s that guy on the TV show … he’s got the mustache?’ You always know who the guy to ask is, right?”
I knew how he felt. And it hit me harder after the match, as I made my way from the giddy reception through a long, narrow corridor toward the non-VIP parking lot. Halfway down, in an office strewn with wires and cameras, stood a discouraged Jennings and Rutter. They were waiting to be filmed for their postgame reflections. It had been a long and draining experience for them. What’s more, the entire proceeding had been a tribute to the machine. Even the crowd was pulling for it. “We were the away team,” Jennings said. And in the end, the machine delivered a drubbing.
Yet I couldn’t regret the outcome. I’d come to know and appreciate the other people in this drama, the ones who had devoted four years to building this computer. For them, a loss would have been even more devastating than it was for Jennings and Rutter. And unlike the two Jeopardy stars, the researchers had to worry about what would come next. Following a loss, there would be extraordinary pressure to fine-tune the machine for a rematch. Watson, like Deep Blue, wasn’t likely to retire from the game without winning. The machine could always get smarter. This meant that instead of a deliverance from Jeopardy, the team might be plunged back into it. This time, though, instead of a fun and unprecedented event, it would have the grim feel of a do-or-die revenge match. For everyone concerned, it was time to move on. Ferrucci, his team, and their machine all had other horizons to explore. I did too.
But the time I spent with Watson’s programmers led me to think more than ever about the programming of our own minds. Of course, we’ve had to adapt our knowledge and skills for millennia. Many of us have decided, somewhere along the way, that we don’t need to know how to trap a bear, till a field, carry out long division, or read a map. But now, as Jennings points out, the value of knowledge itself is in flux. In a sense, each of us faces the question that IBM’s Jeopardy team grappled with as they outfitted Watson with gigabytes of data and operating instructions. What makes sense to store up there? And what cognit
ive work should be farmed out to computers?
The solution, from a purely practical view, is to fine-tune the mind for the jobs and skills in which the Watsons of the world still struggle: the generation of ideas, concepts, art, and humor. Yet even in these areas, the boundaries between humans and their machines are subject to change. When Watson and its kin scour databases to come up with hypotheses, they’re taking a step toward the realm of ideas. And when Watson’s avatar builder, Joshua Davis, creates his works of generative art, who’s to say that the computer doesn’t have a hand in the design? In the end, each of us will calibrate our own blends of intelligence and creativity, looking for more help, as the years pass, from ever-smarter computers.
But just because we’re living with these machines doesn’t mean that we have to program ourselves by their remorseless logic. Our minds, after all, are far more than tools. In the end, some of us may choose to continue hoarding facts. We are curious animals, after all. Beyond that, one purpose of smart machines is to free us up to do the thousand and one things that only humans enjoy, from singing and swimming to falling in love. These are the opportunities that come from belonging to a species—our species—as it gets smarter. It has its upside.
Acknowledgments
A year ago, I was anxiously waiting for a response to a book proposal. I had high hopes for it, and was disappointed when my marvelous editor at Houghton Mifflin Harcourt, Amanda Cook, told me to look for another project. We’d find something better, she said. It turned out she was right. I’m thankful for her guidance in this book. She’s had a clear vision for it all along. Her notes in the margins of the manuscript are snippets of pure intelligence. Not long ago I scanned one of these Amanda-infested pages and e-mailed it to a few friends just to show them how a great editor works—and how fortunate I am to have one.
I applaud the entire team at Houghton, which turned itself inside out to publish this book on a brutal schedule and to innovate with the e-book. If it had settled for the lollygagging schedule I outlined in my proposal, this book would be showing up in stores six months after Watson’s televised Jeopardy match. Thanks to Laura Brady, Luise Erdmann, Taryn Roeder, Ayesha Mizra, Bruce Nichols, Lori Glazer, Laurie Brown, Brian Moore, Becky Saikia-Wilson, Nicola Fairhead, and all the other people at Houghton who helped produce this book in record time. Thanks also to my wonderful agent, Jim Levine, and the entire team at Levine-Greenberg.
I remember calling Michael Loughran at IBM on a winter evening and suggesting that this Jeopardy machine might make a good book. He was receptive that night, and remained so throughout. He was juggling four or five jobs at the same time and tending to a number of constituencies, from the researchers in the IBM’s War Room to the various marketing teams in Manhattan and the television executives in Culver City. Yet he found time for me and made this book possible. Thanks, too, to his colleagues at IBM, including Scott Brooks, Noah Syken, Ed Barbini, and my great friend and former BusinessWeek colleague Steve Hamm. I also appreciate the help and insights from the team at Ogilvy & Mather, especially David Korchin and Miles Gilbert, who brought Watson’s avatar to life for me.
The indispensable person, of course, was David Ferrucci. If it’s not clear in the book how open, articulate, and intelligent he is, I failed as a writer. He was my guide, not only to Watson’s brain, but to the broader world of knowledge. He was generous with his time and his team. I’m thankful to all of them for walking me through every aspect of their creation. My questions had to try their patience, yet they never let it show.
Harry Friedman welcomed me to the fascinating world of Jeopardy and introduced me to a wonderful cast of characters, including Rocky Schmidt and the unflappable Alex Trebek. Thanks to them all and to Grant Loud, who was always there to answer my calls. I owe a load of New Jersey hospitality to my California hosts, Natalie and Jack Isquith, and my niece Claire Schmidt.
Scores of people, in the tech world and academia, lent me their expertise and their time. I’m especially grateful to my friends at Carnegie Mellon for opening their doors to me, once again, and to MIT. Thanks, too, to Peter Norvig at Google, Prasanna Dhore at HP, Anne Milley at SAS, and the sharpest mind I know in Texas, Neil Iscoe.
And for her love, support, and help in maintaining a sense of balance, I give thanks to my wife, Jalaire. She’d see the forty Jeopardy shows stored on TiVo and say, “Let’s watch something else.”
Notes
[>] It was a September morning: Like Yahoo! and a handful of other businesses, the official name of the quiz show in this story ends in an exclamation point: Jeopardy! Initially, I tried using that spelling, but I thought it made reading harder. People see a word like this! and they think it ends a sentence. Since I use the name Jeopardy more than two hundred times in the book, I decided to eliminate that distraction. My apologies to the Jeopardy! faithful, many of whom are sticklers for this kind of detail.
[>] pressing the button: A few months before the final match, I was talking to the Jeopardy champion Ken Jennings in Los Angeles. Discussing Watson, he suddenly stopped himself. “What do you call it?” he asked. “Him? It?” The question came up all the time, and even among the IBM researchers the treatment wasn’t consistent. When they were programming or debugging the machine, they naturally referred to it as a thing. But when Watson was playing, “it” would turn into a “he.” And occasionally David Ferrucci was heard referring to it as “I.” In the end, I opted for calling the machine “it.” That’s what it is, after all.
[>] He was the closest thing: For narrative purposes, I focused on a handful of researchers in the Jeopardy project, including Jennifer Chu-Carroll, James Fan, David Gondek, Eric Brown, and Eddie Epstein. But they worked closely with groups of colleagues too numerous to mention in the telling of the story. Here are the other members of IBM’s Jeopardy challenge team: Bran Boguraev, Chris Welty, Adam Lally, Anthony (Tony) Levas, Aditya Kalyanpur, James (Bill) Murdock, John Prager, Michael McCord, Jon Lenchner, Gerry Tesauro, Marshall Schor, Tong Fin, Pablo Duboue, Bhavani Iyer, Burn Lewis, Jerry Cwiklik, Roberto Sicconi, Raul Fernandez, Bhuvana Ramabhadran, Andrew Rosenberg, Andy Aaron, Matt Mulholland, Karen Ingraffea, Yuan Ni, Lei Zhang, Hiroshi Kanayama, Kohichi Takeda, David Carmel, Dafna Sheinwald, Jim De Piante, and David Shepler.
[>] most books had too many words: For more technical details on the programming of Watson, see AI Magazine (vol. 31, no. 3, Fall 2010). The entire issue is devoted to Q-A technology and includes lots of information about the Jeopardy project.
[>] smarter Watson wouldn’t have: One of the reasons the fast version of Watson is so hard to manage and update is its data. In order to speed up the machine’s processing of its 75 gigabytes of data, the IBM team processed it all beforehand. This meant that instead of the machine figuring out on the fly the subjects and objects of sentences, this work was done in advance. Watson didn’t need to parse a sentence to conclude that the apple fell on Isaac Newton’s head and not vice versa. Looking at it from a culinary perspective, the researchers performed for Watson the job that pet food manufacturers like Purina carry out for animals: They converted a rich, varied, and complex diet into the informational equivalent of kibbles. “When we want to run a question,” Ferrucci said, “the evidence is already analyzed. It’s already parsed. The people are found, the locations are found.” This multiplied Watson’s data load by a factor of 6—to 500 gigabytes. But it also meant that to replicate the speed of Watson in other domains, the data would likely have to be already processed. This makes answering machines less flexible and versatile.
[>] “a huge knowledge base”: NELL has a human-instructed counterpart. Called Cyc, it’s a universal knowledge base painstakingly assembled and organized since 1984 by Cycorp, of Austin, Texas. In its scope, Cyc was as ambitious as the eighteenth-century French encyclopedists, headed by Denis Diderot, who attempted to catalogue all of modern knowledge (which had grown significantly since the days of Aristotle). Cyc, headed by a computer scientist named Douglas Lenat, aspired to fill a si
milar role for the computer age. It would lay out the relationships of practically everything, from plants to presidents, so that intelligent machines could make inferences. If they knew, for example, that Ukraine produced wheat, that wheat was a plant, and that plants died without water, it could infer that a historic drought in Ukraine would curtail wheat production. By 2010, Cyc has grown to nearly half a million terms, from plants to presidents. It links them together with some fifteen thousand types of relations. A squirrel, just to pick one example, has scores of relationships: trees (climbed upon), rats (cousins of), cars (crushed by), hawks (hunted by), acorns (food), and so on. The Cyc team has now accumulated five million facts, or assertions, relating all of the terms to one another. Cyc represents more than six hundred researcher-years but is still limited in its scope. And in the age of information, the stratospheric growth of knowledge seems sure to outstrip the efforts of humans to catalogue it manually.
[>] And there were still so many: Before working on a new algorithm for Watson, team members had to come up with a hypothesis for the goals and effectiveness of the algorithm, then launch it on a Wiki where all the team members could debate the concept, refine it, and follow its progress. Here’s an example of one hypothesis: “A Pun-Relation classifier based on a statistical combination of synonymy, ngram associations, substring and sounds like detectors will increase Watson’s accuracy and precision at 70 by more than 10 percent on pun questions while not negatively impacting overall performance on non-pun questions.”
Sources and Further Reading
Bailey, James, Afterthought: The Computer Challenge to Human Intelligence, Basic Books, 1997
Benjafield, John G., Cognition, Oxford University Press, 2007
Bringsjord, Selmer, and David Ferrucci, Artificial Intelligence and Literary Creativity: Inside the Mind of Brutus, A Storytelling Machine, Psychology Press, 1999
Final Jeopardy Page 24