Final Jeopardy

Home > Other > Final Jeopardy > Page 17
Final Jeopardy Page 17

by Stephen Baker


  To be fair, the Jeopardy executives understood this issue and were committed to avoiding the problem. The writers would be kept in the dark. They wouldn’t know which of their clues and categories would be used in the Watson showdown. According to the preliminary plans, they would be writing clues for fifteen Tournament of Champions matches, and Watson would be playing only one of them. But Ferrucci didn’t think this was sufficient. One way or another they would be influenced by it, or at least they might be. From a scientific standpoint, there was no distinction between the existence and the possibility of bias. Either way, the results were compromised. Fifteen games, he said, was not a big enough set. “That’s not statistically significant.”

  Epstein said that claims of bias always came up in man-machine contests, because humans always changed their behavior when faced with a machine while other humans were busy tweaking the machine. “Even in the Deep Blue chess game,” he said, “Kasparov was complaining bitterly that the IBM team cheated.” But how could a machine cheat in chess? “Nobody’s writing questions,” he said.

  The concern in the chess match, Ferrucci said, was that the humans responded to Kasparov’s tactics and retuned the computer. Kasparov had already adjusted to the computer’s strategy and then found himself facing another one. “He was very offended by that,” Ferrucci said.

  “So it was unfair for the machine to change its strategy,” Epstein asked, “but OK for the man to change his?”

  Throughout the meal, they discussed the nature of competitions between people and machines. They weren’t new, by any stretch. But earlier in the process, they had seemed more theoretical. Now, with Jeopardy laying down the law, theory was colliding with reality.

  “I have a question for you,” Epstein said at one point. “Has anyone discussed what risks Jeopardy has in this?”

  “It raises interesting issues,” Ferrucci said. “One of them is, do they have a horse in the race? Do they want something in particular to happen? We don’t control anything but our machine,” he went on. “We want our machine to win. This is not a mystery. Jeopardy holds a different set of cards.”

  “They want it to be entertaining,” Loughran said.

  “But what does it mean for the show for the computer to win or lose?” Ferrucci asked. “What does it mean for the show if the human, let’s say, clobbers the computer? These are open questions. They’re in a tough spot, because on the one hand they have to maintain the [show’s] integrity. But at the same time, there’s a perception issue, and people might think: ‘Gee, would Jeopardy be obsolete if the computer won? Would this change the game?’”

  “No way,” Loughran said.

  “You don’t think so,” Ferrucci said, “but they have to be asking the question.” He paused and ate quietly for a few moments. This marketing side of the project, which made it so exciting, was also causing stress. He was spending more and more time dealing with the Jeopardy team and the PR machine and less time in the lab. He was having trouble sleeping. He turned back to Loughran. “So,” he asked, “knowing everything you know now, would you still do this project?”

  “Sure,” Loughran said. “And you?”

  “I’m a science guy, so I absolutely would,” Ferrucci said. He had been able to build his machine, after all, despite his concerns about how the Jeopardy match would play out. “But if I was a marketing guy,” he added, “I’m not so sure …”

  “We’ve got some issues, but it’s fun,” Loughran said. “We’ll get through it all.”

  In the following days, Ferrucci looked to buffer the science of the Jeopardy challenge from the intrusions of the marketing effort and from the carnival odds of a one-game showdown. He devised a two-track approach for Watson, one for the scientific record, the other for the show biz extravaganza. What he wanted, he said, was a set of sixty sparring rounds in the fall of 2010 with the top Jeopardy players—Tournament of Champions qualifiers. These test games would be played on boards written for humans. There would be no bias toward the machine, unconscious or not. Watson would win some of the matches and lose others. But those games would represent its record against a high level of competition. It would establish a benchmark for Q-A technology and produce a valuable set of data. Even if Watson went on to stumble on national television, its reputation among the tech and scientific communities would be assured. “Those games will be where we’ll get the real statistics on how we did,” he said. “The final game is fun. But these sixty matches will be the real study.”

  Through the month of April, on conference calls and in meetings, Ferrucci repeatedly voiced his concerns to the Jeopardy team. He wasn’t concentrating on the finger anymore. He had made that concession, and a hardware team at IBM was busy creating one. They estimated that it would slow Watson’s response time by eight milliseconds. But Ferrucci continued to push for the sixty matches with champions. In April, Jeopardy’s Friedman and Schmidt came to watch a sparring match. In the meeting with them that followed, Ferrucci went on at length about unconscious writers’ bias and tainted questions. “Dave really hammered on these points,” said one participant. The Jeopardy executives defended their processes and protocols. The conversation grew heated. A camera crew was filming the meeting for a documentary. They were asked to leave.

  That was when Jeopardy, in Friedman’s term, “stepped back.” In late April, Friedman’s team sent word to IBM that they were reconsidering every aspect of the competition, including the match itself. With this news, Watson was suddenly put into the same powerless position as thousands of other Jeopardy wannabes: waiting for an invitation. Unlike the aspiring human players, though, Watson had no other occupation, no other purpose on earth. What’s more, it had the hopes of a $96 billion corporation resting on it. And within weeks, millions of New York Times readers would be learning about the coming match in a Sunday magazine cover story—unless Loughran, IBM’s press officer, alerted the Times that the match was in trouble. He keep quiet, trusting that the two sides would resolve their disagreements.

  A week later, Friedman was sitting in his office on the Sony lot in Culver City. The walls were plastered with photographs and awards from his forty-year career in game shows, his seven Emmys, and his Cable and Broadcasting Hall of Fame plaque. It had been a tense day. That morning he had had another contentious phone conversation with Ferrucci, according to IBM. And he had to iron out strategy with Rocky Schmidt and Lisa Broffman, another producer on the show, before Schmidt flew to Europe the next day. “We’ve been so immersed in this,” Friedman said, minutes after meeting with Schmidt, “that we’re stepping back just a little bit and thinking of the various ramifications. We’re analyzing every aspect now. This is a big deal.”

  Ferrucci’s concerns about bias left the Jeopardy executives feeling exposed. The IBM scientist, after all, was implying that Jeopardy’s writers might tilt the match toward one side or the other—or at least be perceived as doing so. Ferrucci was always careful to ascribe this possibility to unconscious bias. But for Jeopardy, a franchise born from the quiz show scandals of the 1950s, the hint of such bias—conscious or not—was poisonous. And even if Ferrucci kept this concern to himself, the point he made repeatedly was that other scientists would raise the very same questions. If it was even within the realm of possibility that Jeopardy had an interest in the outcome and if it used its own people to write the clues, the fairness of the game and the validity of the contest were compromised.

  For Friedman, who took pride in lending the Jeopardy platform to science, this was tough to swallow. “[IBM] could have done this with a bunch of questions that academics came up with,” he said. “But they wanted this fabulous platform. They gain the platform and lose control.” He maintained that the future of the franchise hinged on its reputation for fairness and integrity and that if the match went forward, his team would be laying down the rules. “We rigidly adhere to not only our own code of conduct, but also obviously to the FCC regulations,” he said. “We run a pretty tight ship.”


  He described how the contestants are sequestered during the filming, accompanied by handlers and prohibited from mingling with anyone with access to the clues. He recalled one time that Ken Jennings, hurrying to change a tie that “strobed on camera,” ducked into a little nook where Alex Trebek checked his appearance before stepping onto the set. This was a breach. The three players had to always stick together, under surveillance, so that no one could even be suspected of receiving favorable treatment. Jennings was quickly ousted as if he’d been a North Korean commander strolling into a meeting of the Joint Chiefs at the Pentagon. Friedman laughed. “He could have been shot.” Then he play-acted. “Oh, sorry Ken, we had to wing you in your foot there, but your buzzer thumb seems to be intact. Are you OK to play the next show? You wandered into a secure area …”

  Friedman brushed off Ferrucci’s suggestion that the results of the game could have a lasting impact on the Jeopardy franchise, much as Kasparov’s loss to Deep Blue forever changed chess. He laughed. “When all of this, as wonderful as it is, is over, we’re going to continue playing our game. We’re going to continue what got us here through six thousand shows.” The message to IBM: “Thanks for coming. Thanks for playing. We’re back to our day jobs.”

  The tentative plan had been for the IBM team to move Watson to the Culver City studios in late 2010. It would participate in a championship match, playing against Ken Jennings and the winner of an invitational tournament of past champions. But bringing the machine into Jeopardy’s “tightly run ship,” it was now clear, raised complications, including demands to change the show’s tried-and-tested procedures. It raised the risk of rancor and public accusations. And it wasn’t just the scientists who might complain. The humans would be playing for a million-dollar prize, underwritten by IBM. If they suspected any tilting in the competition, they were sure to speak up as well. In a sense, Watson’s intrusion into the Jeopardy world represented a potential breach of its own. Friedman had to weigh his options.

  One of Jeopardy’s biggest fears, Ferrucci believed, was that Watson would grow dramatically smarter and faster over the summer and lay waste to its human foes. This was early May, weeks after Jeopardy had begun to reconsider the match. He was sitting in the empty observation room on the Jeopardy set in Yorktown. At the podium on the other side of the window, Watson had been beating humans in sparring sessions about 65 percent of the time but showing few signs of frightening dominance. The Jeopardy crew, he said, continued to assess the matches. “Is this fun, is this entertaining, is this speaking to our audience?” A superendowed Watson, conceivably, would drain the match of all suspense. In that case, according to Ferrucci, “People would say, ‘Of course computers can beat humans! Why did you promote all this?’”

  Ferrucci wished it were true, that with a few devilishly smart new algorithms Watson would leap forward into a class of its own. That way he might sleep better. But he didn’t see it happening. “We’re working our butts off,” he said. “But I don’t think we’re going to see a lot of difference in Watson’s performance four months from now, when we have to freeze the system. But they don’t know that,” he said. “How could they know? They’re not doing the science.”

  Jeopardy’s executives also worried, he said, that IBM could jack up Watson’s speed simply by adding more computing power. This was logical. But it was not the case. In distributing Watson’s work to more than two thousand processors, the IBM team had broken it into hundreds of smaller tasks, most of them operating in parallel. But a handful of these jobs, Ferrucci explained, required sequential analysis. Whether it was parsing a sentence or developing a confidence ranking for a potential answer, certain basic algorithms had to follow strings of commands, with each step hinging on the previous one. This took time.

  Think of a billionaire selecting his outfit for a black-tie event. He can assign some tasks to his minions. One can buy socks while others track down shoes, pants, and a shirt. Those jobs, in computer lingo, run in parallel. But when it comes to getting dressed, the work becomes sequential. The man must place one leg in his pants, then the other. Maybe a few butlers could help with his socks simultaneously and hold out the arms of his shirt for him, but such opportunities are limited. This sequence, to the last snap of the cuff links, takes time.

  Inside Watson, some of the sequential algorithms gobbled up a quarter of a second, half a second, even more. And they could not be shared among many machines. Watson, in all likelihood, would need the same two to five seconds by the date of the final match. At this point, the only path to greater speed was to come up with simpler commands—smarter algorithms that led Watson through fewer steps. But Ferrucci didn’t expect advances of more than a few milliseconds in the coming months. Nonetheless, he found it hard to make his case to the Jeopardy team. From their perspective, Watson had risen from a slow-witted assortment of software into a champion-caliber player in two years. Who was to say it wouldn’t keep improving?

  In this jittery home stretch, it was becoming clear, the two sides shared parallel fears. While Hollywood worried that the computer would grow too smart, the IBM team focused on its vulnerabilities and fretted that it would fail. Watson’s weekly blunders in the sparring sessions added to the long lists of bugs to eliminate, mauled pronunciations to remedy, potential gaffes to program around. There wasn’t enough time to address them all. In the same pragmatic spirit that had marked the entire enterprise, they carried out time-benefit analyses on their list of items and focused on the ones at the top. “This is triage,” said Jennifer Chu-Carroll.

  One small but vital job was to equip Watson with a profanity filter. The machine had already demonstrated, by dropping the F-bomb on its answer panel, how heedless it could be to basic norms of etiquette and decency. The simplest approach would be to prohibit it from even considering the seven forbidden words that George Carlin made famous in his comedy routines, plus a handful of others, including ethnic and racial slurs. It would be easy to draw up a set of rules—heuristics—to override the machine’s statistically generated candidate answers. But what about words that included no-no’s? Consider this 2006 clue in the category T Birds: “In North America this term is properly applied to only 4 species that are crested, including the tufted.” Would a list of forbidden vulgarities impede Watson from answering, “What is a titmouse?” Researchers, said David Gondek, would have to come up with “loose filters,” leaving room for such exceptions. But they were sure to miss some.

  Then there was the matter of pronunciation. Watson could turn an everyday word into a profanity with just a slip of its mathematically programmed tongue. This was even more likely with foreign words. How would it fare, for example, answering this 2007 clue in the Plane Crazy category? “In 1912 this Dutch plane builder set up a plant near Berlin; later, his fighter planes were flown by the Red Baron.” This would likely be a slam-dunk for Watson, but leading it to correctly enunciate “What is Fokker?” would involve meticulous calibration of its vowel pronunciation. Surely, some would say, Jeopardy would not include a Fokker clue in a match involving a machine. But that would revive Ferrucci’s key concern: that Jeopardy would be customizing the game for Watson. In the end, Watson’s scientists could only fashion a profanity filter, make room for the most common exceptions, tweak potentially problematic pronunciations, and hope for the best. If the machine, despite their work, found a way to say something outrageous, it would be up to the show’s producers to bleep it out.

  While her colleagues steered Watson away from gaffes, Chu-Carroll was concentrating on Final Jeopardy, an area of mounting concern for Ferrucci’s team. Final Jeopardy was often decisive. Throughout Watson’s training, the team had studied and modeled all of the clues as a single group. They knew from the beginning that the Final Jeopardy clues were trickier—“less direct, more implicit,” in Chu-Carroll’s words—but their data set of these clues was much smaller, only one sixty-first of the total. Because of this, the computer was still treating the Final Jeopardy clue like every other clue
on the board, coming up with its answer in three to five seconds—and then just waiting as the thirty-second jingle went through its sixty-four notes. This was enough time for trillions of additional calculations. Wasn’t there a way to take advantage of the extra seconds?

  The team was not about to devise new ways to find answers. That would require major research. But Watson could take more time to analyze the answers it collected. The method, like most of Watson’s cognitive work, would require exhaustive and repetitive computing. The idea was to generate from each answer a series of declarative statements, then check to see if they looked right. In the category English Poets, for example, one recent Final Jeopardy clue had read: “Translator Edward Fitzgerald wrote that her 1861 ‘death is rather a relief to me … no more Aurora Leighs, thank God.’” Let’s say Watson came up with measurable confidence in three potential names, Alfred Lord Tennyson, Emily Dickinson, and Elizabeth Barrett Browning. It could then proceed to craft statements, putting each name in the following sentences: “_____ died in 1861,” “_______ wrote Aurora Leigh,” “_______ was an English poet.” Naturally, some of the sentences would turn out to be foolish, perhaps: “_________ found relief in death” or “________ died, thank God.” In any case, for each of dozens of sentences, Watson would race through its database looking for matches. This represented an immense amount of work. But the results could boost its confidence in the correct response—“Who is Elizabeth Barrett Browning?”—and guide it toward acing Final Jeopardy.

 

‹ Prev