The Theory That Would Not Die

Home > Other > The Theory That Would Not Die > Page 28
The Theory That Would Not Die Page 28

by Sharon Bertsch McGrayne


  Most of the attendees were from the military, but Richardson and Stone gave talks about their submarine hunts, while others spoke about search and rescue and oil deposit exploration. Among the civilian attendees was Ray Hilborn, a newly minted zoology Ph.D. who was interested in saving the fish populations in the world’s oceans. He had gotten his first exposure to simple Bayesian applications at the East–West think tank run by Raiffa in Vienna six years earlier.

  Hilborn was struck by the fact that people at the NATO conference dealt with practical problems that required making decisions. His own job involved setting legal limits on fishing for particular species, and, listening to the speeches, he said to himself, “God, this really is the way to ask the questions I want to ask. Everyone who is actually involved in the real world does things in a Bayesian way. The limit of [frequentist] approaches just isn’t obvious until you actually have to make some decisions. You have to be able to ask, ‘What are the alternative states of nature, and how much do I believe they’re true?’ [Frequentists] can’t ask that question. Bayesians, on the other hand, can compare hypotheses.”20 It would take him almost 10 years to find a fisheries problem for Bayes, but Hilborn was a patient man.

  part V

  victory

  16.

  eureka!

  As the computer revolution flooded the modern world with data, Bayes’ rule faced one of its biggest crises in 250 years. Was an eighteenth-century theory—discovered when statistical facts were scarce and computation was slow and laborious—doomed to oblivion? It had already survived five near-fatal blows: Bayes had shelved it; Price published it but was ignored; Laplace discovered his own version but later favored his frequency theory; frequentists virtually banned it; and the military kept it secret.

  By 1980 anyone studying the environment, economics, health, education, or social science was tap-tapping data into a terminal connected to a mainframe computer. “Input” became a verb. Medical records, for example, included dozens of measurements of every patient, ranging from age, gender, and race to blood pressure, weight, heart attacks, and smoking history. Wine statistics included chemical measurements and quality scores for every vintner, varietal, and vintage.

  But who knew which of the 20-odd attributes of a patient or a wine were important? Researchers needed to analyze more than one unknown at once, calculate the relationships among multiple variables, and determine the effect that a change in one had on others. Yet real-life facts did not fall into tidy bell-shaped curves, and each time the variables were refined, more unknowns cropped up. Computers were generating a multivariate revolution and spawning a plague of unknowns called the curse of high-dimensionality. Statisticians had to wonder whether a method ideal for tossing a few gold coins could adapt to the new world.

  Bayesians were still a small and beleaguered band of a hundred or more in the early 1980s. Computations took forever, so most researchers were still limited to “toy” problems and trivialities. Models were not complex enough. The title of a meeting held in 1982, “Practical Bayesian Statistics,” was a laughable oxymoron. One of Lindley’s students, A. Philip Dawid of University College London, organized the session but admitted that “Bayesian computation of any complexity was still essentially impossible. . . . Whatever its philosophical credentials, a common and valid criticism of Bayesianism in those days was its sheer impracticability.”1

  The curse of dimensionality plagued both Bayesians and frequentists. Many in the academic statistical community still debated whether to indulge in computer-intensive analysis at all. Most statisticians of the era were mathematicians, and many confused their beloved old calculators—their manual Brunsvigas and electric Facits—with the new electronic computers. They tried to analyze the new data with methods designed for old calculating tools. One statistician boasted that his calculating procedure consisted of marching into his university’s computer center and saying, “Get on with it.”2 Thanks to pioneers like Robert Schlaifer and Howard Raiffa, Bayesians held sway in business schools and theoretical economics, while statistics departments were dominated by frequentists, who focused on data sets with few unknowns rather than on those packed with unknowns.

  As a result, many statistics departments watched from the sidelines as physical and biological scientists analyzed data about plate tectonics, pulsars, evolutionary biology, pollution, the environment, economics, health, education, and social science. Soon engineers, econometricians, computer scientists, and information technologists acquired the cachet that humdrum statisticians seemed to lack. Critics sniffed that statistics departments were isolated, defensive, and on the decline. Leading statistical journals were said to be so mathematical that few could read them and so impractical that few would want to. The younger generation seemed to think that computers and their algorithms could replace mathematics entirely.

  In what could have been a computational breakthrough, Lindley and his student Adrian F. M. Smith showed Bayesians how to develop models by breaking complex scientific processes into stages called hierarchies. The system would later become a Bayesian workhorse, but at the time it fell flat on its face. The models were too specialized and stylized for many scientific applications. It would be another 20 years before Bayesian textbooks taught hierarchical models. Mainstream statisticians and scientists simply did not believe that Bayes could ever be practical. Indicative of their attitude is the fact that while Thomas Bayes’ clerical ancestors were listed in Britain’s Dictionary of National Biography he himself was not.

  Yet amazingly, amid these academic doubts, a U.S. Air Force contractor used Bayes to analyze the risk of a Challenger space shuttle accident. The air force had sponsored Albert Madansky’s Bayesian study at the RAND Corporation during the Cold War, but the National Aeronautics and Space Administration (NASA) still distrusted subjective representations of uncertainty. Consequently, it was the air force that sponsored a review in 1983 of NASA’s estimates of the probability of a shuttle failure. The contractor, Teledyne Energy Systems, employed a Bayesian analysis using the prior experience of 32 confirmed failures during 1,902 rocket motor launches. Using “subjective probabilities and operating experience,” Teledyne estimated the probability of a rocket booster failure at 1 in 35; NASA’s estimate at the time was 1 in 100,000. Teledyne, however, insisted that “the prudent approach is to rely on conservative failure estimates based on prior experience and probabilistic analysis.”3 On January 28, 1986, during the shuttle’s twenty-fifth launch, the Challenger exploded, killing all seven crew members aboard.

  The disparity between the military’s sometime acceptance of Bayes and the academic statistical community’s refusal to embrace it is still puzzling. Did the military’s top-secret experience with Bayes during the Second World War and the Cold War give it confidence in the method? Was the military less afraid of using computers? Or did it simply have easier access to powerful ones? Given that many sources dealing with the Second World War and the Cold War are still classified, we may never know the answers to these questions.

  Several civilian researchers tackling hitherto intractable problems concerning public health, sociology, epidemiology, and image restoration did experiment during the 1980s with computers for Bayes. A major controversy about the effect of diesel engine emissions on air quality and cancer inspired the first attempt. By the 1980s cancer specialists had solid data about the effects of cigarette smoke on people, laboratory animals, and cells but little accurate information about diesel fumes. William H. DuMouchel from MIT’s mathematics department and Jeffrey E. Harris from its economics department and Massachusetts General Hospital teamed up in 1983 to ask, “Could you borrow and extrapolate and take advantage of information from non-human species for humans?”4 Such meta-analyses, combining the results of similar trials, were too complex for frequentists to address, but DuMouchel was a disciple of Smith and his hierarchical work with Lindley. Harris was not a statistician and did not care what method he used as long as it answered the question. Adopting hierarchi
cal Bayes, they borrowed information from laboratory tests on mice, hamster embryo cells, and chemical substances. They even incorporated experts’ opinions about the biological relevance of nonhumans to humans and of cigarette to diesel smoke. Bayes let them account formally for their uncertainties about combining information across species.

  Microcomputers were not widely available. Many of the researchers studying the new acquired immune deficiency syndrome (AIDS) epidemic, for example, were making statistical calculations by hand, and mathematical shortcuts were still being published for them. Harris programmed the diesel project in APL, a language used for matrix multiplications, and sent it via teletype to MIT’s computer center. He drew illustrations on poster boards, added captions by pressing on wax letters, and arranged for an MIT photographer to take their pictures.

  Thanks to mice and hamster studies, DuMouchel and Harris were able to conclude that even if light-duty diesel vehicles captured a 25% market share over 20 years, the risk of lung cancer would be negligible for the typical urban resident compared to the typical pack-a-day cigarette smoker. The smoker’s risk was 420,000 times worse. Today, Bayesian meta-analyses are statistically old hat, but DuMouchel and Harris made Bayesians salivate for more big-data methods—and for the computing power to deal with them.

  While lung cancer researchers explored Bayes, Adrian Raftery was working at Trinity College in Dublin on a well-known set of statistics about fatal coal-dust explosions in nineteenth-century British mines. Previous researchers had used frequency techniques to show that coal mining accident rates had changed over time. They assumed, however, that the change had been gradual. Raftery wanted to check whether it had been gradual or abrupt. First, he developed some heavy frequentist mathematics for analyzing the data. Then, out of curiosity, he experimented with Bayes’ rule, comparing a variety of theoretical models to see which had the highest probability of determining when the accidents rates actually changed. “I found it very easy. I just solved it very, very quickly,” Raftery recalled. And in doing so he discovered a remarkable, hitherto unknown event in British history. Raftery’s Bayesian analysis revealed that accident rates plummeted suddenly in the late 1880s or early 1890s. A historian friend suggested why. In 1889, British miners had formed the militant Miners’ Federation (which later became the National Union of Mine Workers). Safety was their number one issue. Almost overnight, coal mines got safer.

  “It was a Eureka moment,” Raftery said. “It was quite a thrill. And without Bayesian statistics, it would have been much harder to do a test of this hypothesis.”5 Frequency-based statistics worked well when one hypothesis was a special case of the other and both assumed gradual behavior. But when hypotheses were competing and neither was a special case of the other, frequentism was not as helpful, especially with data involving abrupt changes—like the formation of a militant union.

  Raftery wound up publishing two papers in 1986 about modeling abrupt rate changes. His first, frequentist paper was long, dense, and virtually unread. His second, Bayesian paper was shorter, simpler, and had a much greater impact. Raftery’s third 1986 paper ran just 1-1/4 pages and had an immediate effect on sociologists. The article appeared just as many sociologists were about to give up on frequentism’s controversial p-values. A typical sociologist might work with data sets about thousands of individuals, each with hundreds of variables such as age, race, religion, climate, and family structure. Unfortunately, when researchers tried to determine the relevance of those variables using frequentist methods developed by Karl Pearson and R. A. Fisher for 50 to 200 cases, the results were often bizarre. Obscure effects became important, or went in opposite directions, or were disproved by later studies. By selecting a single model for large samples, frequentists ignored uncertainties about the model. Yet few social scientists could repeat their surveys or rerun experiments under precisely the same conditions. By the early 1980s many sociologists had concluded that, for testing hypotheses, their intuition was more accurate than frequentism.

  Bayes, on the other hand, seemed to produce results that corresponded more closely to sociologists’ intuition. Raftery told his colleagues, “The point is that we should be comparing the models, not just looking for possibly minor discrepancies between one of them and the data.”6 Researchers really want to know which of their models is more likely to be true, given the data. With Bayes, researchers could study sudden shifts from one stable form to another in biological growth phases, trade deficits and economic behavior, the abandonment and resettlement of archaeological sites, and clinical conditions such as rejection and recovery in organ transplantation and brain waves in Parkinson’s disease. Bayesian hypothesis testing swept sociology and demography, and Raftery’s short paper is still among the most cited in sociology.

  Meanwhile, image processing and analysis had become critically important for the military, industrial automation, and medical diagnosis. Blurry, distorted, imperfect images were coming from military aircraft, infrared sensors, ultrasound machines, photon emission tomography, magnetic resonance imaging (MRI) machines, electron micrographs, and astronomical telescopes. All these images needed signal processing, noise removal, and deblurring to make them recognizable. All were inverse problems ripe for Bayesian analysis.

  The first known attempt to use Bayes to process and restore images involved nuclear weapons testing at Los Alamos National Laboratory. Bobby R. Hunt suggested Bayes to the laboratory and used it in 1973 and 1974. The work was classified, but during this period he and Harry C. Andrews wrote a book, Digital Image Restoration, about the basic methodology; the laboratory declassified the book and approved its publication in 1976. The U.S. Congress retained Hunt in 1977 and 1978 to analyze images of the shooting of President Kennedy. In his testimony, Hunt did not refer to Bayes. “Too technical for a Congressional hearing,” he said later.

  At almost the same time that Hunt was working on image analysis for the military, Julian Besag at the University of Durham in England was using diseased tomato plants to study the spread of epidemics. Bayes helped him discern local regularities and neighborly interactions among plants growing in pixel-like lattice systems. Looking at one pixel, Besag realized he could estimate the probability that its neighbor might share the same color, a useful tool for image enhancement. But Besag was not a card-carrying Bayesian, and his work went largely unnoticed at the time.

  A group of researchers with Ulf Grenander at Brown University was trying to design mathematical models for medical imaging by exploring the effect one pixel could have on a few of its neighbors. The calculations involved easily a million unknowns. Grenander thought that once Bayes was embedded in a realistic problem, philosophical objections to it would fade.

  Stuart Geman was attending Grenander’s seminar in pattern theory, and he and his brother Donald Geman tried restoring a blurry photograph of a roadside sign. The Gemans were interested in noise reduction and in finding ways to capture and exploit regularities to sharpen the lines and edges of unfocused images. Stuart had majored in physics as an undergraduate and knew about Monte Carlo sampling techniques. So the Geman brothers invented a variant of Monte Carlo that was particularly suited to imaging problems with lots of pixels and lattices.

  Sitting at a table in Paris, Donald Geman thought about naming their system. A popular Mother’s Day gift at the time was a Whitman’s Sampler assortment of chocolate bonbons; a diagram inside the box top identified the filling hidden inside each candy. To Geman, the diagram was a matrix of unknown but enticing variables. “Let’s call it Gibbs sampler,” he said, after Josiah Willard Gibbs, a nineteenth-century American physicist who applied statistical methods to physical systems.7

  The dots were starting to connect. But the Gemans, like Besag, operated in a small niche field, spatial statistics. And instead of nibbling at their problem a pixel at a time, the Gemans tried gobbling it whole. Working at pixel levels on a 64 x 64-cell fragment of a photo, they produced too many unknowns for computers of the day to digest. They wrote up t
heir sampler in a formidably difficult paper and published it in 1984 in IEEE Transactions on Pattern Analysis and Machine Intelligence. Specialists in image processing, neural networks, and expert systems quickly adopted the method, which, with computers gaining more power every year, also sparked the interest of some statisticians. The brothers spent the next year racing around the globe giving invited talks.

  Donald Geman used the Gibbs sampler to improve satellite images; Stuart used it for medical scans. Several years later, statisticians outside the small spatial imaging community began to realize that more general versions could be useful. The Gibbs sampler’s flexibility and reliability would make it the most popular Monte Carlo algorithm. Still later the West learned that a Russian dissident mathematician, Valentin Fedorovich Turchin, had discovered the Gibbs sampler in 1971, but his work had been published in Russian-language journals, did not involve computers, and was overlooked.

  By 1985 the old argument between Bayesians and frequentists was losing its polarizing zing, and Glenn Shafer of Rutgers University thought it had “calcified into a sterile, well-rehearsed argument.” Persi Diaconis made a similar but nonetheless startling observation, one that no one familiar with the battles between Bayesians, Karl Pearson, Ronald Fisher, and Jerzy Neyman could have imagined. “It’s nice that our field is so noncompetitive,” Diaconis said. “If you take many other fields, like biology, people just slice each other up.”8

  Still, the conviction remained that without more powerful and accessible computers and without user-friendly and economical software, computing realistic problems with Bayes was impossible.

  Lindley had been programming his own computers since 1965 and regarded Bayes as ideal for computing: “One just feeds in the axioms and the data and allows the computer to follow the laws of arithmetic.” He called it “turning the Bayesian crank.” But his student Smith saw something the older man did not: the key to making Bayes useful in the workplace would be computational ease, not more polished theory. Later Lindley wrote, “I consider it a major mistake of my professional life, not to have appreciated the need for computing rather than mathematical analysis.”9

 

‹ Prev