The Theory That Would Not Die
Page 30
In 1991 a public frightened about the AIDS epidemic demanded universal screening for human immunodeficiency virus (HIV). Biostatisticians quickly used Bayes to demonstrate that screening the entire population for a rare disease would be counterproductive. Wesley O. Johnson and Joseph L. Gastwirth showed that a sensitive test like that for HIV virus would tell many patients they were infected with HIV when in fact they were not. The media publicized several suicides of people who had received a positive HIV test result but not realized they did not necessarily have the virus. Scaring healthy people and retesting them with more sophisticated procedures would have been extremely costly.
In much the same way, but more controversially, a Bayesian approach showed that an expensive MRI test for breast cancer might be appropriate for a woman whose family had many breast cancer patients but inappropriate for giving every woman between 40 and 50 years of age. A woman who has a mammogram every year for 10 years can be almost 100% sure of getting one false positive test result, and the resulting biopsy can cost 1,000 to 2,000. In the case of prostate cancer, the screening test for high blood levels of prostate-specific antigen (PSA) is highly accurate when it comes to identifying men with the cancer. Yet the disease is so rare that almost everyone who gets a positive test result is found not to have the cancer at all. (See appendix B for how to calculate a Bayesian problem involving breast cancer.)
On the other hand, Bayes also showed that people with negative test results for breast and prostate cancer cannot feel carefree. The PSA test is so insensitive that good news provides almost no assurance that a man does not actually have prostate cancer. The same is true to a lesser extent of mammography: its sensitivity is about 85 to 90%, meaning that a woman who finds a lump only a few months after getting a negative mammogram should still see a doctor immediately. A strict Bayesian gives patients their probabilities for cancer instead of a categorical yes or no.
Because genetics involves extremely rare diseases, imperfect tests, and complicated problems where tiny errors in the data or calculations can affect decisions, Bayesian probabilities are expected to become increasingly important for diagnostic test assessment.
Spiegelhalter spent more than 10 years trying to sell the medical community on BUGS as the mathematical way to learn from experience. He argued that “advances in health-care typically happen through incremental gains in knowledge rather than paradigm-shifting breakthroughs, and so this domain appears particularly amenable to a Bayesian perspective.” He contended that “standard statistical methods are designed for summarizing the evidence from single studies or pooling evidence from similar studies, and have difficulties dealing with the pervading complexity of multiple sources of evidence.”21 While frequentists can ask only certain questions, a Bayesian can frame any question.
With the introduction of high-performance workstations in the 1980s it became possible to use Bayesian networks to handle medicine’s many interdependent variables, such as the fact that a patient with a high temperature will usually also have an elevated white blood count. Bayesian networks are graphs of nodes with links revealing cause-and-effect relationships. The “nets” search for particular patterns, assign probabilities to parts of the pattern, and update those probabilities using Bayes’ theorem. A number of people helped develop Bayesian networks, which were popularized in 1988 in a book by Judea Pearl, a computer scientist at UCLA. By treating cause and effect as a quantifiable Bayesian belief, Pearl helped revive the field of artificial intelligence.
Ron Howard, who had become interested in Bayes while at Harvard, was working on Bayesian networks in Stanford’s economic engineering department. A medical student, David E. Heckerman, became interested too and for his Ph.D. dissertation wrote a program to help pathologists diagnose lymph node diseases. Computerized diagnostics had been tried but abandoned decades earlier. Heckerman’s Ph.D. in bioinformatics concerned medicine, but his software won a prestigious national award in 1990 from the Association for Computing Machinery, the professional organization for computing. Two years later, Heckerman went to Microsoft to work on Bayesian networks.
The Federal Drug Administration (FDA) allows the manufacturers of medical devices to use Bayes in their final applications for FDA approval. Devices include almost any medical item that is not a drug or biological product, items such as latex gloves, intraocular lenses, breast implants, thermometers, home AIDS kits, and artificial hips and hearts. Because they are usually applied locally and improved a step at a time, new models of a device should come equipped with objective prior information.
Pharmaceuticals are different. Unlike devices, pharmaceuticals are generally one-step, systemic discoveries so that potentially an industry could subjectively bias Bayes’ prior hunch. Thus the FDA has long resisted pressure from pharmaceutical companies that want to use Bayes when applying for approval to sell a drug in the United States.
According to Spiegelhalter, however, the same battle seems to have subsided in England. Drug companies use WinBUGS extensively when submitting their pharmaceuticals for reimbursement by the English National Health Service. The process is, in Spiegelhalter’s words, “very Bayesian without using the B-word” because it uses judgmental priors about a drug’s cost effectiveness. International guidelines also allow Bayesian drug applications, but these guidelines are widely considered too vague to be effective.
Outside of diagnostic and medical device testing, Bayes’ mathematical procedures have had little impact on basic clinical research or practice. Working doctors have always practiced an intuitive, nonmathematical form of Bayes for diagnosing patients. The biggest unknown in medicine, after all, is the question, What is causing the patient’s symptoms? But traditional textbooks were organized by disease. They said that someone with, for instance, measles probably has red spots. But the doctor with a speckled patient wanted to know the inverse: the probability that the patient with red spots has measles. Simple Bayesian problems—for example, What is the probability that an exercise echocardiogram will predict heart disease?—started appearing on physicians’ licensing examinations in 1992.
One of the few times physicians make even rough Bayesian calculations occurs when a patient has symptoms that could involve a life-threatening heart attack, a deep venous thrombosis, or a pulmonary embolism. To assess the danger, a physician assigns points for each of the patient’s risk factors and adds the points. In the heart attack algorithm, the score determines the probability that within the next two weeks the patient will die, have a heart attack, or need coronary arteries opened. The points for thrombosis and embolism tell whether a patient has a low, medium, or high risk of developing a clot and which test can best produce a diagnosis. It is expected that software will be available soon to automatically tell doctors and patients the effect of a particular test result on a diagnosis.
Outside of medicine, endangered populations of ocean fish, whales, and other mammals were among the first to benefit from Bayes’ new computational heft. Despite the U.S. Marine Mammal Protection Act of 1972, only a few visible and highly publicized species of whales, dolphins, and other marine mammals had been protected. Some exploited populations, including several whale species in the Antarctic, collapsed while being “managed.” Having strong, abundant information about a species, frequentists and Bayesians could reach similar decisions, but when evidence was weak—as it often is in the case of marine mammals—only Bayes incorporated uncertainties about the data at hand and showed clearly when more information was needed.
Most whale populations rebounded during the 1980s, but in 1993 two government biologists, Barbara L. Taylor and Timothy Gerrodette, wrote, “At least part of the blame for the spectacular [past] overexploitation of the great whales can be placed on scientists being unable to agree . . . [on a] clear way to treat uncertainty. . . . In certain circumstances, a population might go extinct before a significant decline could be detected.”22 During the administration of Bill Clinton, the Wildlife Protection Act was amended to accept Bayesian
analyses alerting conservationists early to the need for more data.
Scientists advising the International Whaling Commission were particularly worried about the uncertainty of their measurements. Each year the commission establishes the number of endangered bowhead whales Eskimos can hunt in Arctic seas. To ensure the long-term survival of the bowheads, scientists compute 2 numbers each year: the number of bowheads and their rate of increase. The whales, perhaps the longest-lived mammals on Earth, can grow to more than 60 feet in length, weigh more than 60 tons, and eat 2 tons of food a day. They spend only about 5% of their time on the ocean surface because they can submerge for 30 minutes at a time, and they use their enormous heads to ram through ice when they need to surface for air. In spring, teams of scientists stood on tall perches to spot bowheads rounding Point Barrow, Alaska, on their annual migration into the western Arctic. The count was fraught with uncertainties.
Scientists representing an entire spectrum of opinion, from Greenpeace to whaling nations, worried that a lack of trustworthy data on bowhead populations was opening the species to too much risk. During a weeklong meeting to discuss the problem in 1991, their chair asked, “What can we do?”23 There was complete silence. The scientists were the world’s leading bowhead experts, but none of them could answer the question.
When Judith Zeh, the committee chair, got back to the department of statistics at the University of Washington in Seattle, she talked with Raftery, who had recently moved there from Dublin. Not surprisingly, after his experience analyzing coal mining accidents, Raftery thought Bayes might help. Using it, the committee could assign uncertainties to all their data and augment visual sightings with recordings of whales vocalizing near underwater hydrophones.
Providentially, the spring of 1993 was a rewarding year for bowhead counting, and sightings plus vocalizing showed that the whales were almost assuredly increasing at a healthy rate. Their recovery indicated that protecting other great whale populations from commercial whaling might help them recover too.
The entire process—involving rival Bayesian and frequentist methods and whaling factions that often profoundly disagreed—could have been wildly contentious. But times were changing. Pragmatism ruled. Making fullscale Bayesian analyses to combine visual and acoustical data was expensive, and thus, because they confirmed previous frequentist studies, they were discontinued. Raftery moved on to using Bayes for 48-hour weather forecasting.
Other wildlife researchers picked up the Bayesian banner. When Paul R. Wade decided in 1988 to use Bayes for his Ph.D. thesis, he said, “I was off in this small area of marine mammal biology but I felt as if I were in the center of a revolution in science.” Ten years later, at the National Oceanic and Atmospheric Administration, he was comparing frequentist and Bayesian analyses of a small, isolated population of 200 or 300 beluga whales in the Arctic and Sub-Arctic waters of Cook Inlet, Alaska. The legal take by native hunters was roughly 87 whales a year. Frequentist methods would have required seven years of data collection to assess whether this catch was sustainable. With Bayes, five years of data showed that the beluga population was almost certainly declining substantially, and the experiment could stop. “With a small population, even a two-year delay can be important,” Wade said.24 In May 1999 a hunting moratorium went into effect for the Cook Inlet belugas.
Meanwhile, a committee of the National Research Council in the National Academy of Sciences strongly recommended the aggressive use of Bayesian methods to improve estimates of marine fish stocks too. Committee members emphasized in 1998 that, because the oceans are vast and opaque, wildlife managers need realistic measurements of the uncertainties in their observations and models. Otherwise, policymakers cannot gauge potential risks to wildlife. Today many fisheries journals demand Bayesian analyses. Lindley had predicted that the twenty-first century would be a Bayesian era because the superior logic of Bayes’ rule would swamp frequency-based methods. David Blackwell at Berkeley disagreed, saying, “If the Bayesian approach does grow in the statistical world, it will not be because of the influence of other statisticians but because of the influence of actuaries, engineers, business people, and others who actually like the Bayesian approach and use it.”25 It appeared that Blackwell was right: pragmatism could drive a paradigm shift. Philosophies of science had not changed. The difference was that Bayes finally worked.
Diaconis had been wondering for years, “When is our time?” In 1997, he decided, “Our time is now.”26
Smith became the first Bayesian president of the Royal Statistical Society in 1995. Three years later he stunned his friends by quitting statistics to become an administrator of the University of London. A proponent of evidence-based medicine, he wanted to help develop evidence-based public policy too. Dismayed colleagues chastised him for abandoning Bayes’ rule. But Smith told Lindley that all the problems of statistics had been solved. We have the paradigm, he said, and with MCMC we know how to implement it. He told Diaconis that there was nothing else to do with statistical problems but to plug them into a computer and turn the Bayesian crank.
In 2008, when Smith became scientific adviser to the United Kingdom’s Minister of Innovation, Universities, and Skills, a Royal Society spokesman volunteered that three statisticians have become prime ministers of Great Britain.27
17.
rosetta stones
Two and a half centuries after Bayes and Laplace discovered a way to apply mathematical reasoning to highly uncertain situations, their method has taken wing, soaring through science and the Internet, burrowing into our daily lives, dissolving language barriers, and perhaps even explaining our brains. Gone are the days when a few driven individuals searched orphanages and coded messages for data and organized armies of women and students to make tedious calculations. Today’s Bayesians revel in vast archives of Internet data, off-the-shelf software, tools like MCMC, and computing power so cheap it is basically free.
The battle between Bayesian and frequentist forces has cooled. Bayesianism as an all-encompassing framework has been replaced by utilitarian applications and computation. Computer scientists who joined the Bayesian community cared about results, not theory or philosophy. And even theorists who once insisted on adhering strictly to fundamental principles now accept John Tukey’s view from the 1950s: “Far better an approximate answer to the right question, . . . than an exact answer to the wrong question.” Researchers adopt the approach that best fits their needs.
In this ecumenical atmosphere, two longtime opponents—Bayes’ rule and Fisher’s likelihood approach—ended their cold war and, in a grand synthesis, supported a revolution in modeling. Many of the newer practical applications of statistical methods are the results of this truce.
As a collection of computational and statistical machinery, Bayes is still driven by Bayes’ rule. The word “Bayes” still entails the idea, shared by de Finetti, Ramsey, Savage, and Lindley, that probability is a measure of belief and that it can, as Lindley put it, “escape from repetition to uniqueness.” That said, most modern Bayesians accept that the frequentism of Fisher, Neyman, and Egon Pearson is still effective for most statistical problems: for simple and standard analyses, for checking how well a hypothesis fits data, and as the foundation of many modern technologies in areas such as machine learning.
Prominent frequentists have also moderated their positions. Bradley Efron, a National Medal of Science recipient who wrote a classic defense of frequentism in 1986, recently told a blogger, “I’ve always been a Bayesian.” Efron, who helped develop empirical Bayesian procedures while remaining a committed frequentist, told me that Bayes is “one of the great branches of statistical inference. . . . Bayesians have gotten more tolerant these days, and frequentists are seeing the need to use Bayesian kinds of reasoning, so maybe we are headed for some kind of convergence.”
Bayes’ rule is influential in ways its pioneers could never have envisioned. “Neither Bayes nor Laplace,” Robert E. Kass of Carnegie Mellon observed, “recognized a fundamen
tal consequence of their approach, that the accumulation of data makes open-minded observers come to agreement and converge on the truth. Harold Jeffreys, the modern founder of Bayesian inference for scientific investigation, did not appreciate its importance for decisionmaking. And the loyalists of the 1960s and 1970s failed to realize that Bayes would ultimately be accepted, not because of its superior logic, but because probability models are so marvelously adept at mimicking the variation in realworld data.”
Bayes has also broadened to the point where it overlaps computer science, machine learning, and artificial intelligence. It is empowered by techniques developed both by Bayesian enthusiasts during their decades in exile and by agnostics from the recent computer revolution. It allows its users to assess uncertainties when hundreds or thousands of theoretical models are considered; combine imperfect evidence from multiple sources and make compromises between models and data; deal with computationally intensive data analysis and machine learning; and, as if by magic, find patterns or systematic structures deeply hidden within a welter of observations. It has spread far beyond the confines of mathematics and statistics into high finance, astronomy, physics, genetics, imaging and robotics, the military and antiterrorism, Internet communication and commerce, speech recognition, and machine translation. It has even become a guide to new theories about learning and a metaphor for the workings of the human brain.
One of the surprises is that Bayes, as a buzzword, has become chic. Stanford University biologist Stephen H. Schneider wanted a customized cancer treatment, called his logic Bayesian, got his therapy, went into remission, and wrote a book about the experience. Stephen D. Unwin invented a personal “faith-belief factor” of 28% to boost the 67% “Bayesian probability” that God exists to 95%, and his book hit the bestseller list. A fashionable expression, “We’re all Bayesians now,” plays on comments made years ago by Milton Friedman and President Richard Nixon that “We’re all Keynesians now.” And the CIA agent in a Robert Ludlum thriller tells the hero, “Lucky? Obviously you haven’t heard anything I’ve said. It was a matter of applying Bayes’ Theorem to estimate the conditional probabilities. Giving due weight to the prior probabilities and . . .”1