The Theory That Would Not Die
Page 31
It must be conceded that not everyone shares this enthusiasm. Some important fields of endeavor remain opposed. Perhaps the biggest irony is that partisan politics have kept the American census anti-Bayesian, despite Laplace’s vision that enlightened governments would adopt it.
Anglo-American courtrooms are also still largely closed to Bayes. Among the few exceptions was a case in 1994 where Bayes was used to demonstrate that New Jersey state troopers singled out African American drivers for traffic stops. During a rape trial in the 1990s, British lawyers tried teaching judges and juries how to assess evidence using Bayesian probability; judges concluded that the method “plunges the jury into inappropriate and unnecessary realms of theory and complexity.”2 Forensic laboratory science in Great Britain and Europe is a different story. Unlike the FBI Laboratory in the United States, the Forensic Sciences Services in Britain have followed Lindley’s advice and now employ Bayesian methods extensively to assess physical evidence. Continental European laboratories have developed a quantitative measure for the value of various types of evidence, much as Turing and Shannon used Bayes to develop bans and bits as units of measurement for cryptography and computers. Bayes—tactfully referred to in forensic circles as the “logical” or “likelihood ratio” approach—has been applied successfully in cases where numbers were available, particularly in DNA profiling. Because DNA databanks involve probabilities about almost unimaginably tiny numbers—one in 20 million, say, or one in a billion—they may eventually open more courtroom doors to Bayesian methods.
Bayes made headlines in 2000 by augmenting DNA evidence with statistical data to conclude that Thomas Jefferson had almost certainly fathered six children by his slave Sally Hemings. DNA evidence from Jefferson’s and Hemings’s families had already offered strong evidence that the third president and the author of the Declaration of Independence fathered Hemings’s youngest son. But Fraser D. Neiman, the director of archaeology at Jefferson’s Monticello plantation, studied whether Hemings’s other conceptions fell during or close to one of Jefferson’s sporadic visits to Monticello. Then he used Bayes to combine the prior historical testimony and DNA evidence with probable hypotheses based on Jefferson’s calendar. Assuming a 50–50 probability that the prior evidence was true, Fraiser concluded it was nearly certain—99% probable—that Jefferson had fathered Hemings’s six children.
In economics and finance Bayes appears at multiple levels, ranging from theoretical mathematics and philosophy to nitty-gritty money making. The method figured prominently in three Nobel Prizes awarded for theoretical economics, in 1990, 1994, and 2004. The first Nobel involved the Italian Bayesian de Finetti, who anticipated the Nobel Prize–winning work of Harry Markowitz by more than a decade. Mathematical game theorists John C. Harsanyi and John Nash (the latter the subject of a book and movie, A Beautiful Mind) shared a Bayesian Nobel in 1994. Harsanyi often used Bayes to study competitive situations where people have incomplete or uncertain information about each other or about the rules. Harsanyi also showed that Nash’s equilibrium for games with incomplete or imperfect information was a form of Bayes’ rule.
In 2002 Bayes won perhaps not an entire Nobel Prize but certainly part of one. Psychologists Amos Tversky, who died before the prize was awarded, and Daniel Kahneman showed that people do not make decisions according to rational Bayesian procedures. People answer survey questions depending on their phrasing, and physicians choose surgery or radiation for cancer patients depending on whether the treatments are described in terms of mortality or survival rates. Although Tversky was widely regarded as a philosophical Bayesian, he reported his results using frequentist methods. When James O. Berger of Duke asked him why, Tversky said it was simply a matter of expedience. During the 1970s it was more difficult to publish Bayesian research. “He just took the easy way out,” Berger said.
Alan Greenspan, former chairman of the Federal Reserve, said he used Bayesian ideas to assess risk in monetary policy. “In essence, the riskmanagement approach to monetary policymaking is an application of Bayesian decisionmaking,” Greenspan told the American Economic Association in 2004.3 The audience of academic and government economists gasped; few experts in finance analyze empirical data with Bayes.
Economists were still catching their breaths when Martin Feldstein, professor of economics at Harvard, stood up at the same meeting and delivered a crash course in Bayesian theory. Feldstein had been President Ronald Reagan’s chief economic advisor and was president of the National Bureau of Economic Research, a leading research organization. He learned Bayesian theory at the Howard Raiffa–Robert Schlaifer seminars at Harvard Business School in the 1960s. Feldstein explained that Bayes lets the Federal Reserve weigh a low-probability risk of disaster more heavily than a higher-probability risk that would cause little damage. And he likened Bayes to a man who has to decide whether to carry an umbrella even when the probability of rain is low. If he carries an umbrella but it does not rain, he is inconvenienced. But if he does not carry an umbrella and it pours, he will be drenched. “A good Bayesian,” Feldstein concluded, “finds himself carrying an umbrella on many days when it does not rain.”4
Four years later rain flooded the financial markets and banking. Greenspan, who by then had retired from the Federal Reserve, told Congress he had not foreseen the collapse of the real-estate lending bubble in 2008. He did not blame the theory he used but his economic data, which “generally covered only the past two decades, a period of euphoria . . . [instead of] historic periods of stress.”5
But did Greenspan actually employ Bayesian statistics to quantify empirical economic data? Or were Bayesian concepts about uncertainty only a handy metaphor? Former Reserve Board governor Alan S. Blinder of Princeton thought the latter, and when he said so during a talk, Greenspan was in the audience and did not object.
In pragmatic contrast to abstract Bayes at the Nobel ceremonies and philosophical Bayes at the Federal Reserve, the rule stands behind one of the most successful hedge funds in the United States. In 1993 Renaissance Technologies hired away from IBM a Bayesian group of voice recognition researchers led by Peter F. Brown and Robert L. Mercer. They became comanagers of RenTech’s portfolio and technical trading. For several years, their Medallion Fund, limited to former and current employees, averaged annual returns of about 35%. The fund bought and sold shares so rapidly one day in 1997 that it accounted for more than 10% of all NASDAQ trades.
To search for the nonrandom patterns and movements that will help predict markets, RenTech gathers as much information as possible. It begins with prior knowledge about the history of prices and how they fluctuate and correlate with each other. Then the company continuously updates that prior base. As Mercer explained, “RenTec gets a trillion bytes of data a day, from newspapers, AP wire, all the trades, quotes, weather reports, energy reports, government reports, all with the goal of trying to figure out what’s going to be the price of something or other at every point in the future. . . . We want to know in three seconds, three days, three weeks, three months. . . . The information we have today is a garbled version of what the price is going to be next week. People don’t really grasp how noisy the market is. It’s very hard to find information, but it is there, and in some cases it’s been there for a long, long time. It’s very close to science’s needle-in-a-haystack problem.”
Like investors at RenTech, astronomers, physicists, and geneticists use Bayes to discern elusive phenomena almost drowning in unknowns. A scientist may face hundreds of thousands of variables without knowing which produce the best predictions. Bayes lets them estimate the most probable values of their unknowns.
When Supernova 1987A exploded, astronomers detected precisely 18 neutrinos. The particles originated from deep inside the star and were the only clues about its interior, so the astronomers wanted to extract as much information as possible from this minuscule amount of data. Tom Loredo, a graduate student at the University of Chicago, was told to see what he could learn. Because the supernova was
a one-of-a-kind opportunity, frequency-based methods did not apply. Loredo began reading papers by Lindley, Jim Berger, and other leading Bayesians and discovered that Bayes would let him compare various hypotheses about his observations and choose the most probable. His Ph.D. thesis from 1990 wound up introducing modern Bayesian methods to astronomy.
Since then, Bayes has found a comfortable niche in high-energy astrophysics, x-ray astronomy, gamma ray astronomy, cosmic ray astronomy, neutrino astrophysics, and image analysis. In physics, Bayes is hunting for elusive neutrinos, Higgs-Boson particles, and top quarks. All these problems deal with needles in haystacks, and Loredo now uses Bayes at Cornell University in a new field, astrostatistics.
In much the same way, biologists who study genetic variation are limited to tiny snippets of information almost lost among huge amounts of meaningless and highly variable data in the chromosomes. Computational biologists searching for genetic patterns, leitmotifs, markers, and disease-causing misspellings must extract the weak but important signals from the deafening background noise that masquerades as information.
Susan Holmes, a professor in Stanford’s statistics department, works in computational and molecular biology on amino acids. Some are extremely rare, and if she used frequentist methods she would have to quantify them with a zero. Adopting the cryptographic technique used by Turing and Good at Bletchley Park, she tries to crack the genetic code by assigning missing species a small possibility.
Given that the DNA in every biological cell contains complete instructions for making every kind of protein in the body, what differentiates a kidney cell from a brain cell? The answer depends on whether a particular gene is turned on or off and whether the genes work together or not. Holmes assembles huge microarrays of genetic data filled with noise and other distractions that may hide a few important signals from the turned-on genes. Each microarray consists of many genes arrayed in a regular pattern on a small glass slide or membrane; with it, she can analyze the expression of thousands of genes at once.
“It’s very tenuous,” she says. “[Imagine that] you have a city at night like Toronto or Paris with a very dense population and lots of buildings, and at 2 a.m. you look at which lights are lit up in all the buildings. Then at 3 and 4 a.m., you look again. So you develop a pattern of which rooms are lit up, and from that you infer who in the city knows who. That’s how sparse the signal is and how far you have to jump, to see which genes are working together. You don’t even have phone connections. But the image of something lighting up is a little bit like the image of microarrays. Microarrays have so much noise, it seems crazy. You just have rustles, whispers of signals, and then lots of noise. You spend a lot of time looking at a lot of data.” Because prior information is needed to assemble the networks, many microarrays are analyzed using Bayesian methods.
Daphne Koller, a leader in artificial intelligence and computational biology at Stanford, also works on microarrays. She wanted to see not only which genes have turned on or off, but also what controls and regulates them. By looking at the activity levels of genes in yeast, she figured out how they are regulated. Then she switched to mouse and human cells to determine the differences in genetic regulation between healthy people and patients with cancer or Type II diabetes, particularly the metabolic (insulin resistance) syndrome.
On the vexed issue of priors, Koller considers herself a relaxed middle-of-the-roader. In contrast, Bayesian purists like Michael I. Jordan of Berkeley and Philip Dawid of Cambridge object to the term “Bayesian networks”; they regard Judea Pearl’s nomenclature as a misnomer because Bayesian networks do not always have priors and Bayes without priors is not Bayes. But Koller insists that her networks fully qualify as Bayesian because she carefully constructs priors for their variables.
Koller’s fascination with uncertainty has led her from genetics to imaging and robotics. Images typically have variable and ambiguous features that are embedded in background clutter. The human visual system sends ten million signals per second to the brain, where a billion neurons strip off random fluctuations and irrelevant, ambiguous information to reveal shape, color, texture, shading, surface reflections, roughness, and other features. As a result, human beings can look at a blurry, distorted, noisy pattern and instantly recognize a tomato plant, a car, or a sheep. Yet a state-of-the-art computer trained to recognize cars and sheep may picture only nonsensical rectangles. The difference is that the human brain has prior knowledge to integrate with the new images.
“It’s mind-boggling,” says Koller. The problem is not computer hardware; it is writing the software. “A computer can easily be trained to distinguish a desert from a forest, but where the road is and where it’s about to fall off a cliff, that’s much harder.”
To explore such imaging problems, Sebastian Thrun of Stanford built a driverless car named Stanley. The Defense Advanced Research Projects Agency (DARPA) staged a contest with a 2 million prize for the best driverless car; the military wants to employ robots instead of manned vehicles in combat. In a watershed for robotics, Stanley won the competition in 2005 by crossing 132 miles of Nevada desert in seven hours.
While Stanley cruised along at 35 mph, its camera took images of the route and its computer estimated the probability of various obstacles. As the robot navigated sharp turns and cliffs and generally stayed on course, its computer could estimate with 90% probability that a wall stood nearby and with a 10% probability that a deep ditch was adjacent. In the unlikely event that Stanley fell into the ditch, it would probably have been destroyed. Therefore, like the Bayesian economist who carries an umbrella on sunny days, Stanley slowed down to avoid even unlikely catastrophes. Thrun’s artificial intelligence team trained Stanley’s sensors, machine-learning algorithms, and custom-written software in desert and mountain passes.
Thrun credited Stanley’s victory to Kalman filters. “Every bolt of that car was Bayesian,” Diaconis said proudly. After the race, Stanley retired in glory to its own room in the Smithsonian National Museum of American History in Washington.
The next year a Bayesian team from Carnegie Mellon University and General Motors won another 2 million from DARPA by maneuvering a robot through city traffic while safely avoiding other cars and obeying traffic regulations. Urban planners hope driverless cars can solve traffic congestion. Another Carnegie Mellon team relied on Bayes’ rule and Kalman filters to win international robotic soccer championships involving fast-moving multirobot systems.
The U.S. military is heavily involved in imaging issues. Its Automatic Target Recognition (ATR) technology is a heavy user of Bayesian methods for robotic and electronic warfare, combat vehicles, cruise missiles, advanced avionics, smart weapons, and intelligence, surveillance, and reconnaissance. ATR systems employ radar, satellites, and other sensors to distinguish between, for example, a civilian truck and a missile launcher. Some ATR computer programs start with Bayes’ controversial 50–50 odds, even though these can have a strong impact on rare events and better information may be available. Echoing generations of critics, at least one anonymous ATR analyst regards Bayes as “an affront, a cheap easy trick. It depends on an initial hunch. And yet it turns out to be an effective approximation that seems to solve many of the world’s problems. So Bayes’ rule is wrong . . . except for the fact that it works.” Other approaches have been computationally more expensive and did not produce better answers.
Besides imaging problems, the military involves Bayes in tracking, weapons testing, and antiterrorism. Reagan’s Ballistic Missile Defense applied a Bayesian approach to tracking incoming enemy ballistic missiles. Once it was sufficiently probable that a real missile had been detected, Bayes allowed sensors to communicate only their very latest data instead of recalculating an entire problem from scratch each time. The National Research Council of the National Academy of Sciences strongly urged the U.S. Army to use Bayesian methods for testing weapons systems, specifically the Stryker family of light, armored vehicles. Many military systems cannot be tested in
the large sample sizes required by frequentist methods. A Bayesian approach allows analysts to combine test data with information from similar systems and components and from earlier developmental tests. Terrorist threats are generally estimated with Bayesian techniques. Even before the attacks of September 11, 2001, Digital Sandbox of Tyson’s Corner, Virginia, used Bayesian networks to identify the Pentagon as a possible target. Bayes combined expert and subjective opinions about possible events that had never occurred.
The United States is not the only country trying to predict terrorism. As Britain considered building a national data bank to detect potential terrorists, Bayes raised the same alarm it had against mass HIV screening. Terrorists are so rare that the definition of a terrorist will have to be extremely accurate or else many, many people will be identified as dangerous when they are, in fact, not at all.
On the Internet Bayes has worked its way into the very fiber of modern life. It helps to filter out spam; sell songs, books, and films; search for web sites; translate foreign languages; and recognize spoken words. David Heckerman, who used Bayesian networks to diagnose lymph node diseases for his Ph.D. thesis, has the modern practitioner’s wide-open attitude about Bayes: “The whole thing about being a Bayesian is that all probability represents uncertainty and, anytime you see uncertainty, you represent it with probability. And that’s a lot bigger than Bayes’ theorem.”