Final Jeopardy

Page 18

by Stephen Baker

James Fan, meanwhile, was going over clues in which Watson failed to understand the subject. At one meeting at the Hawthorne labs, he brought up an especially puzzling one. In the category Postal Matters, it asked: “The first known air mail service took place in Paris in 1870 by this conveyance.” From its analysis, Watson could conclude that it was supposed to find a “conveyance.” That was the lexical answer type, or LAT. But what was a conveyance? In all of the ontologies it had on hand, there was no such grouping. There were groups of trees, colors, presidents, even flavors of ice cream—but no “conveyances.” And if Watson looked up the word, it would find vague references to everything from communication to the transfer of legal documents. One of its meanings involved transport, but the computer would hardly know to focus its search there.

What to do? Fan was experimenting with a new grouping of LATs. At a meeting of one algorithm team on a June afternoon, he started to explain how he could prepare Watson for what he called weird LATs.

Ferrucci didn’t like the sound of it. “We don’t have any way to mathematically classify ‘weird,’” he objected. “That’s a word you just introduced.” Run-of-the-mill LATs, such as flowers, presidents, or diseases, provided Watson with vital intelligence, dramatically narrowing its search. But an amorphous grouping of “weird” words, he feared, would send the computer off in bizarre directions, looking at more distant relationships in the clue and bringing in masses of erroneous possibilities, or noise.

“There are ways to measure it,” Fan said. “We can look at how many instances there are of the LAT in Yago”—a huge semantic database with details on more than two million entities. “And if it isn’t there, we can classify it as “weird.”

“Just based on frequency?” Ferrucci said. There were only weeks left to program Watson, and he saw this “weird” grouping as a wasteful detour. In the end, he gave Fan the go-ahead. “If something looks hare-brained and it’s only going to take a couple of days, you do it.” But he worried that such last-minute fixes might help Watson on a couple of clues and disorient it on many others. And there were still so many other problems to solve.

By the end of June, two weeks after Watson graced the cover of the New York Times Magazine, Harry Friedman had come to a decision. The solution was to remove the man-machine match, with all of its complications, from Jeopardy’s programming schedule. “This is an exhibition,” he said, adding that it made the “whole process a lot more streamlined.” Jeopardy would follow its normal schedule. The season of matches would feature only humans. Writers would follow the standard protocols. Nothing would change. The Watson match, with its distinct rules and procedures, would exist in a world of its own. In a call to IBM, Friedman outlined the new rules of engagement. The match would take place in mid-January at IBM Research. It would feature Ken Jennings and Brad Rutter in two half-hour games. The winner as in all Jeopardy tournaments, would be the player with the highest combined winnings.

Friedman addressed Ferrucci’s concerns about writers’ bias by enlarging the pool of games. Each year the Jeopardy writers produced about a hundred games for the upcoming season, with taping starting in July. A few days before taping, an official from Sullivan Compliance Company, an outside firm that monitors game shows, would select thirty of those games. He would not see the clues or categories and would pick two of the games only by numbers given to them. Once the games were selected, a Jeopardy producer would look at the clues and categories. If any of them overlapped with those that Jennings or Rutter had previously faced, or included the types of audio and visual clues that were off-limits for Watson, the category would be removed and replaced by a similar one from another of the thirty games. If a Melville category recalled one that Jennings had faced in his streak, they might replace it with another featuring Balzac or Whitman. And for Watson’s scientific demonstration, the machine would play fifty-six matches throughout the fall against Tournament of Champions qualifiers. This was the best test stock Jeopardy had to offer—the closest it could come to the two superstars Watson would face in January.

Jeopardy, eager for a blockbuster, had come up with a scheme to manage the risks. After months of fretting, the game was on.

9. Watson Looks for Work

DINNER WAS OVER at the Ferrucci household. It was a crisp evening in Yorktown Heights, a New York suburb ten miles north of the IBM labs. It was dark already, and the fireplace leapt with gas-fed flames. Ferrucci’s two daughters were heading upstairs to bed. In the living room, Ferrucci and his wife, Elizabeth, recounted a deeply frustrating medical journey—one that a retrained Jeopardy computer (Dr. Watson) could have made much easier.

Ferrucci had been envisioning a role for computers in doctors’ offices since his days as a pre-med student at Manhattan College. In graduate school, he went so far as to build a medical expert system to provide advice and answer questions about cardiac and respiratory ills. It worked well, he said, but its information was limited to what he taught it. A more valuable medical aid, he said, would scoop up information from anywhere and come up with ideas and connections that no one had even thought to consider. That was the kind of machine he himself had needed.

Early in the Jeopardy project, Ferrucci said, not long after the bake-off, he started to experience strange symptoms. The skin on one side of his face tingled. A couple of his fingers kept going to sleep. And then, one day, searing pain shot through his head. It lasted for about twenty seconds. Its apparent epicenter was a lower molar on the right side of his mouth. “It felt like someone was driving an ice pick in there,” he said.

When this pain returned, and then came back a third and fourth time, Ferrucci went to his dentist. Probing the tooth and placing ice on it, the dentist attempted to reproduce the same fearsome effects but failed. He could do a root canal, he said, but he had no evidence that the tooth was the problem. Ferrucci then went to a neurologist, who suggested a terrifying possibility. Perhaps, he said, Ferrucci was suffering from trigeminal neuralgia, more commonly known as the suicide disease. It was a nerve disorder so painful that it was believed to drive people to kill themselves. He recommended getting the root canal. It might do the trick, he said, and save them both from the trouble of ransacking the nervous system for answers.

Ferrucci got the root canal. It did no good. The attacks continued. He went to another neurologist, who prescribed anticonvulsion pills. When he read about the medicine’s side effects, Ferrucci said, “I didn’t know whether to take the pills or buy a gun.” He did neither and got an MRI. But putting on the helmet and placing his head in the cramped cylinder filled him with such anxiety that he had to go to another doctor for sedatives.

He had no idea what was wrong, but it wasn’t getting any better. As the Jeopardy project moved along, Ferrucci continued to make presentations to academics, industry groups, and IBM brass. But he started to take along his number two, Eric Brown, as a backup. “If I don’t make it through the talk,” he told Brown, “you just pick up where I leave off.”

In time, Ferrucci started to recognize a certain feeling that preceded the attacks. He sensed it one day, braced himself against a wall, lowered his head slightly, and awaited the pain. It didn’t come. He moved his head the same way the next time and again he avoided the pain. He asked his neurologist about a possible link between the movements of his neck and the facial pain. He was told there was no possible connection.

Months passed. The Ferruccis were considering every option. “Someone told us we should get a special mattress,” said Elizabeth. Then a friend suggested a specialist in craniofacial pain. The visit, Ferrucci learned, was not covered by his insurance plan and would cost $600 out of pocket. He decided to spend the money. A half hour into the meeting, the doctor worked his hands to a spot just below Ferrucci’s collarbone and pressed. The pain rocketed straight to the site of his deadened molar. The doctor had found the connection. A number of small contraction knots in the muscle, myofascial trigger points, created the pain, he said. The muscle was t
raumatized, probably due to stress. With massage, the symptoms disappeared. And Ferrucci kept them at bay by massaging his neck, chest, and shoulders with a two-dollar lacrosse ball.

He walked to a shelf by the fireplace and brought back a book, The Trigger Point Therapy Workbook, bristling with Post-it notes. On one page was an illustration of the sternocleidomastoid. It’s the biggest visible muscle in front of the neck and extends from the sternum to a bony prominence behind the ear. According to the book, trauma within this muscle could cause pain in the head. In a single paragraph on page 53 was a description of Ferrucci’s condition. It had references to toothaches in back molars and a “spillover of pain … which mimics trigeminal neuralgia.” Ferrucci could have written it himself.

If a computer like Watson, customized for medicine, had access to that trigger point workbook along with thousands of other books and articles related to pain, it could have saved Ferrucci a molar and months of pain and confusion. Such a machine likely would have been able to suggest, with at least some degree of confidence, the connection between Ferrucci’s symptoms and his sternocleidomastoid. This muscle was no more obscure than the Asian ornaments or Scandinavian kings that Watson routinely dug up for Jeopardy. Such a machine would not have to understand the connections it found. The strength of the diagnostic engine would not be its depth, but its range. That’s where humans were weak. Each expert that Ferrucci visited had mastered a limited domain. The dentist knew teeth, the neurologists nerves. But no one person, no matter how smart or dedicated, could stay on top of discoveries across every medical field. Only a machine could do that.

A few months earlier, on an August morning, about a hundred IBM employees filed into the auditorium at the Yorktown labs. They included researchers, writers, marketers, and consulting executives. Their goal was to brainstorm ideas for putting Watson to work outside the Jeopardy studio. The time for games was nearly over. Watson, like thousands of other gifted students around the world, had to start earning its keep. It needed a career.

This was an unusual situation for an IBM product, and it indicated that the company had broken one of the cardinal rules of technology development. Instead of focusing first on a market opportunity and then creating the technology for it, IBM was working backward: It built the machine first and was now wondering what in the world to do with it. Other tech companies were notorious for this type of cart-before-the-horse innovation. Motorola, in the 1990s, led the development of a $5 billion satellite phone system, Iridium, before learning that the market for remote communications was tiny and that most people were satisfied with normal cell phones. Within a year of its launch, Iridium went bankrupt. In 1981, Xerox built a new computer, the 6085 Star, featuring a number of startling innovations—a mouse, an ethernet connection, e-mail, and windows that opened and closed. All of this technology would lay the groundwork for personal computers and the networked world. But it would be other companies, notably Apple and Microsoft, that would take it to market. And in 1981, Xerox couldn’t find buyers for its $16,000 machines. Would Watson’s industry-savvy offspring lead to similar boondoggles?

In fairness to IBM, Grand Challenges, like Watson and the Deep Blue chess machine, boosted the company’s brand, even if it came up short in the marketplace. What’s more, the technology developed in the Jeopardy project, from algorithms that calculated confidence in candidate answers to wizardry in the English language, was likely to work its way into other offerings. But the machine’s question-answering potential seemed so compelling that IBM was convinced Watson could thrive in a host of new settings. It was just a question of finding them.

Ferrucci started the session by outlining Watson’s skills. The machine, he said, understood questions posed in natural language and could read millions of documents and scour databases at lightning speed. Then it could come up with responses. He cautioned his colleagues not to think of these as answers but hypotheses. Why the distinction? In every domain most of Watson’s candidate answers would be wrong. Just as in Jeopardy, it would come back with a list of possibilities. People looking to the machine for certainty would be disappointed and perhaps even view it as dumb. Hypotheses initiate a lengthier process. They open up paths of inquiry. If Watson came back from a hunt with ten hypotheses and three of them looked promising, it wouldn’t matter much if the other seven were idiotic. The person using the system would focus on the value. And this is where the vision of Watson in the workplace diverged from the game-playing model. In the workplace, Watson would not be on its own. Unlike the Jeopardy machine, the Watson Ferrucci was describing would be engineered to supplement the human brain, not supplant it.

The time looked ripe for word-savvy information machines like Watson, thanks to the global explosion of a new type of data. If you analyzed the flow of digital data in, say, 1980, only a smidgen of the world’s information had found its way into computers. Back then, the big mainframes and the new microcomputers housed business records, tax returns, real estate transactions, and mountains of scientific data. But much of the world’s information existed in the form of words—conversations at the coffee shop, phone calls, books, messages scrawled on Post-its, term papers, the play-by-play of the Super Bowl, the seven o’clock news. Far more than numbers, words spelled out what humans were up to, what they were thinking, what they knew, what they wanted, whom they loved. And most of those words, and the data they contained, vanished quickly. They faded in fallible human memories, they piled up in Dumpsters and moldered in damp basements. Most of these words never reached computers, much less networks.

That has all changed. In the last decade, as billions of people have migrated their work, mail, reading, phone calls, and webs of friendships to digital networks, a giant new species of data has arisen: unstructured data. It’s the growing heap of sounds and images that we produce, along with trillions of words. Chaotic by nature, it doesn’t fit neatly into an Excel spreadsheet. Yet it describes the minute-by-minute goings-on of much of the planet. This gold mine is doubling in size every year. Of all the data stored in the world’s computers and coursing through its networks, the vast majority is unstructured. Hewlett Packard, for example, the biggest computer company on earth, gets a hundred fifty million Web visits a month. That’s nearly thirty-five hundred customers and prospects per minute. Those visits produce data. So do notes from the company’s call centers, online chat rooms, blog entries, warranty claims, and user reviews. “Ninety percent of our data is unstructured,” said Prasanna Dhore, HP’s vice president of customer intelligence. “There’s always a gap between what you want to know about the customer and what is knowable.” Analysis of the pile of data helps reduce that gap, bringing the customer into sharper focus.

The potential value of this information is immense. It explains why Facebook, a company founded in 2004, could have a market value six years later of $50 billion. The company gathers data, most of it unstructured, from about half a billion people. Beyond social networks and search engines, an entire industry has sprung up to mine this data, to predict people’s behavior as shoppers, drivers, workers, voters, patients, even potential terrorists. As machines, including Watson, have begun to chomp on unstructured data, a fundamental shift is occurring. While people used to break down their information into symbols a computer could understand, computers are now doing that work by themselves. The machines are mastering human communication.

This has broad implications. Once computers can handle language, every person who can type or even speak becomes a potential programmer, a data miner, and an analyst. This is the march of technology. We used to have typists, clerks, legions of data entry experts. With the development of new tools, these jobs became obsolete. We typed (and spell-checked), laid out documents, kept digital records, and even developed our own pictures. Now, a new generation of computers can understand ordinary English, hunt down answers in vast archives of documents, analyze them, and come up with hypotheses. This has the potential to turn entire industries on their heads.

&nbs
p; In the August meeting, Ferrucci told the audience the story of his recent medical odyssey and how a machine like Watson could have helped. Others suggested that Watson could man call centers, function as a brainy research assistant in pharmaceutical labs, or work as a whip-smart paralegal, with nearly instant recall of the precedents, both state and federal, for every case. They briefly explored the idea of Watson as a super question-answering Google. After all, it could carry out a much more detailed analysis of questions and piece together sophisticated responses. But this idea went nowhere. IBM had no experience in the commercial Web or with advertisers. Perhaps most important, Watson was engineered to handle one Jeopardy clue at a time. In those same three seconds, a search engine like Google’s or Microsoft’s Bing handled millions of queries. To even think about competing, the IBM team would have to build an entirely new and hugely expensive computing architecture. It was out of the question.

No, Watson’s future was as an IBM consulting tool and there were plenty of rich markets to explore. But before Watson could make a go of it, Big Blue would have to resolve serious questions. First, how much work and expense would it take to adapt Watson to another profession, to curate a new body of data and to educate the machine in each domain? No one could say until they tried. Second, and just as important, how much resistance would these new knowledge engines encounter? New machines, after all, are in the business of replacing people—not something that often generates a warm welcome. The third issue involved competition. Assuming that natural-language, data-snarfing, hypothesis-spouting machines made it into offices and laboratories, who was to say that they’d be the kin of a Jeopardy contraption? Other companies, from Google to Silicon Valley startups, were sure to be competing in the same market. The potential for these digital oracles was nearly limitless. But in each industry they faced obstacles, some of them considerable.

‹ Prev Next ›