Book Read Free

The Theory That Would Not Die

Page 14

by Sharon Bertsch McGrayne


  At first, his book fell on deaf ears. He was unaccustomed to teaching or explaining his ideas, and no one knew he had used Bayes to help break the Enigma codes. When he gave a talk about his “neo-Bayesian or neo/ Bayes-Laplace philosophy” at a Royal Statistical Society conference, his style was clipped, and he did not waste words.9 Lindley, who was in the audience, reported, “He did not get his ideas across to us. We should have paid much more respect to what he was saying because he was way in advance of us in many ways.”10

  After the war Good continued doing classified cryptography for the British government and frequently used equal priors to help decide what hypotheses he should follow up. When David Kahn’s bestseller The Codebreakers was published in 1967, the National Security Agency censored a passage identifying Good as one of Britain’s top three cryptanalysts. He was at the time one of the world’s most knowledgeable people about the coding industry. Good was quick, smart, original, armed with a fabulous memory, and unconventional enough to think about the paranormal and astrology and to join Mensa, the organization for people with high IQs. He introduced himself with a handshake and the words, “I am Good.”11

  From the Second World War on, everything technical about cryptography was classified, and while Good obeyed the restraints, he chafed against them and looked for ways to evade censorship. To reveal an ultraclassified technique used by Turing to find pairs and triplets of letters indicating the German submariners’ code of the day, Good wrote about a favorite British hobby, bird watching. What if, he suggested, an avid bird watcher spotted 180 different bird species? Many of them would be represented by only one bird; logically, the bird watcher must have entirely missed many other species. Counting those missing species as zero (as a frequentist would have done) has the deleterious effect of asserting that missing species can never be found. Turing decided to assign those missing species a tiny possibility, a probability that is not zero. He was trying to learn about rare letter groupings that did not appear in his collection of intercepted German messages. By estimating the frequency of missing species in his sample, he could use Bayes to estimate the probability of those letter groupings appearing in a much larger sample of messages and in the very next Enigma message he received. Decades later, DNA decoders and artificial intelligence analysts would adopt the same technique.

  Clever as he was, Good could be difficult to get along with, and he moved from post to post. After he spent a year at a cryptography think tank, the Institute for Defense Analyses (then at Princeton University), many coworkers were relieved to see him go. In 1967 Good moved permanently to Virginia Polytechnic Institute and State University in Blacksburg. At his insistence, his contract stipulated that he would always be paid one dollar more than the football coach. He worked far from the Bayesian mainstream, however; during the 1960s Bayes’ rule in the United States was concentrated at the universities of Chicago and Wisconsin and at Harvard and Carnegie Mellon.

  Sidelined by geography and silenced by the British government’s classification of his work with Turing, Good mailed unsolicited carbons of his typed curriculum vitae—what he called his Private List of more than 800 articles and four books12—to startled colleagues. He numbered every work and marked a significant portion of them as classified. Only as the British slowly declassified his cryptanalysis work could he reveal Bayes’ success with the Enigma code. When that happened, he bought a vanity license plate emblazoned with his James Bond spy status and his initials, 007 IJG.

  Hampered by governmental secrecy, his own personality, and an inability to explain his work, Good remained an independent voice within the Bayesian community as two others became its intellectual leaders.

  Unlike Good, Dennis Lindley and Jimmie Savage evolved almost accidentally as Bayesians. When Lindley was a boy during the German bombing of London, a remarkable mathematics teacher named M. P. Meshenberg tutored him in the school’s air raid shelter. Meshenberg convinced Dennis’s father, a roofer proud to have never read a book, that the boy should not quit school early or be apprenticed to an architect. Because of Meshenberg, Dennis stayed in school and won a mathematics scholarship to attend Cambridge University. Later in the war, when the British government asked mathematicians to learn some statistics, Lindley helped introduce statistical quality control and inspection into armaments production for the Ministry of Supply.

  After the war he returned to Cambridge, the British center of probability, where Jeffreys, Fisher, Turing, and Good had either worked or studied. There Lindley became interested in turning the statisticians’ collection of miscellaneous tools into a “respectable branch of mathematics,” a complete body of thought based on axioms and proven theorems.13 Andrei Kolmogorov had done the same for probability in general in the 1930s. Since Fisher in particular often arrived at his ideas intuitively and neglected mathematical details, there was plenty of room for another mathematician to straighten things out logically.

  In 1954, a year after publishing a lengthy article summarizing his project, Lindley visited the University of Chicago, only to realize that Savage had done an even better job of it. Although Lindley and Savage would soon become leading spokesmen for Bayes’ rule, neither realized at this point he was headed down a slippery slope toward Bayes. Each thought he had merely put traditional statistical techniques on a rigorous mathematical footing. Only later did they realize they could not move logically from their rigorous axioms and theorems to the ad hoc methods of frequentism. Lindley said, “We were both fools because we failed completely to recognize the consequences of what we were doing.”14

  Despite being almost blind, Savage was immensely learned on an encyclopedic range of topics. His father, a Jewish East European immigrant with a third-grade education, had changed the family’s name from Ogushevitz to Savage and settled in Detroit. Both Jimmie and his brother Richard were born with extreme myopia and involuntary eye movements. As an adult, before crossing a street Jimmie would wait five or ten minutes to make sure there were no oncoming cars, and attending lectures he would approach the blackboard and peer at it through a powerful monocular. The brothers could read quite comfortably, however, and as children called themselves “reading machines”;15 their mother, a high school graduate and nurse, had kept them supplied with library books. Reading was always a privilege to be cherished, and Jimmie read with a rare intensity and developed the embarrassing habit of questioning everything. His wide-ranging studies and insatiable curiosity would alter the history of Bayes’ rule.

  Because of his eyesight, however, Savage almost missed getting a college education. His teachers considered him feebleminded and unsuited for higher studies. He was finally admitted to Wayne (later Wayne State) University in Detroit. From there he transferred to the chemistry department at the University of Michigan, only to be rejected again, this time as unfit for laboratory work. A kindly mathematics professor, G. Y. Rainich, rescued him by teaching a class of visually impaired students in total darkness. Rainich called it “mental geometry . . . just like in Russia,” where many schools could not afford candles.16 Three students in the class, including Savage, earned doctorates.

  During the Second World War Savage worked in the Statistical Research Group at Columbia University with the future Nobel Prize–winning economist Milton Friedman. The experience persuaded Savage to switch from pure mathematics to statistics. After the war he moved to the University of Chicago, a center of scientific excitement, thanks in large part to the dazzling Nobel Prize winner Enrico Fermi, the last physicist to excel at both experimentation and theory. Fermi himself used Bayes. In the autumn of 1953, when Jay Orear, one of Fermi’s graduate students, was struggling with a problem involving three unknown quantities, Fermi told him to use a simple analytic method that he called Bayes’ theorem and that he had derived from C. F. Gauss. A year later, when Fermi died at the age of 53, Bayes’ rule lost a stellar supporter in the physical sciences.

  Fermi was not the only important physicist to use Bayesian methods during this period. A
few years later, at Cornell University, Richard Feynman suggested using Bayes’ rule to compare contending theories in physics. Feynman would later dramatize a Bayesian study by blaming rigid O-rings for the Challenger shuttle explosion.

  During this exciting period in 1950s Chicago, Savage and Allen Wallis founded the university’s statistics department, and Savage attracted a number of young stars in the field. Reading widely, Savage discovered the work of Émile Borel, Frank Ramsey, and Bruno de Finetti from the 1920s and 1930s legitimizing the subjectivity in Bayesian methods.

  Savage’s revolutionary book Foundations of Statistics was the third in the series of pathbreaking Bayesian publications in the fifties. It appeared in 1954, four years after Bailey’s insurance paper and Good’s book and one year after Lindley’s paper. Because of Ramsey’s early death, it fell to Savage to develop the young philosopher’s ideas about utility and to turn Bayes’ rule for making inferences based on observations into a tool for decision making and action.

  Almost defiantly, Savage proclaimed himself a subjectivist and a personalist. Subjective probability was a measure of belief. It was something you were willing to use for a bet, particularly on a horse race, where bettors share the same information about a horse but come to different conclusions about its chances and where the race itself can never be precisely replicated. Subjective opinions and professional expertise about science, medicine, law, engineering, archaeology, and other fields were to be quantified and incorporated into statistical analyses.

  More than anyone else Savage forced people to think about combining two concepts: utility (the quantification of reward) and probability (the quantification of uncertainty). He argued that rational people make subjective choices to minimize expected losses.

  Savage was confronting the thorniest objection to Bayesian methods: “If prior opinions can differ from one researcher to the next, what happens to scientific objectivity in data analysis?”17 Elaborating on Jeffreys, Savage answered as follows: as the amount of data increases, subjectivists move into agreement, the way scientists come to a consensus as evidence accumulates about, say, the greenhouse effect or about cigarettes being the leading cause of lung cancer. When they have little data, scientists disagree and are subjectivists; when they have piles of data, they agree and become objectivists. Lindley agreed: “That’s the way science is done.”18

  But when Savage trumpeted the mathematical treatment of personal opinion, no one—not even he and Lindley—realized yet that he had written the Bayesian Bible. “Neither of us would have known at the time what was meant by saying we were Bayesians,” Lindley said. Savage’s book did not use the term “Bayesian” at all and referred to Bayes’ rule only once. Savage’s views and his book gained popularity slowly, even among those predisposed to Bayes’ rule. Many had hoped for a how-to manual like Fisher’s Statistical Methods for Research Workers. Lacking computational machinery to implement their ideas, Bayesians were limited to a few simple problems involving easily solved integrals and would spend years adapting centuries-old methods for calculating them. Savage, though, said he was “little inclined to high speed machines for help. This is no doubt partly due to my being reactionary . . . but my main interests are in the qualitative. . . . Tables of functions depending on several parameters are almost unprintable and, when printed quite unintelligible.”19 Savage continued instead to prove abstract mathematical theorems and work on building a logical foundation for Bayesian methods.

  His applications were too whimsical to be useful: what is the probability that aspirin curls rabbits’ ears? what is the most probable speed of neon light through beer? Some thought Savage’s failure to tackle serious problems impeded the spread of Bayesian methods. Lindley complained, “Perhaps statistics would have benefited more if he had not been so punctilious in replying to correspondents and so helpful with students, and instead developed more operational methods that the writers and graduates could have used.”20

  Some readers were also troubled by the fact that Savage used aspects of frequentism to argue for Bayes’ subjective priors, taboo since the nineteenth century. As Savage explained, when he wrote the book he was “not yet a personalistic Bayesian.” He thought he came to Bayesian statistics “seriously only through recognition of the likelihood principle; and it took me a year or two to make the transition.”21

  According to the likelihood principle, all the information in experimental data gets encapsulated in the likelihood portion of Bayes’ theorem, the part describing the probability of objective new data; the prior played no role. Practically speaking, the principle greatly streamlined analysis. Scientists could stop running an experiment when they were satisfied with the result or ran out of time, money, and patience; nonBayesians had to continue until some frequency criterion was met. Bayesians would also be able to concentrate on what happened, not on what could have happened according to Neyman-Pearson’s sampling plan.

  The transition to Bayes took Savage several years, but by the early 1960s he had accepted its logic wholeheartedly, fusing subjective probability with new statistical tools for scientific inference and decision making. As far as Savage was concerned, Bayes’ rule filled a need that other statistical procedures could not. Frequentism’s origin in genetics and biology meant it was involved with group phenomena, populations, and large aggregations of similar objects. As for using statistical methods in biology or physics, the Nobel Prize–winning physicist Erwin Schrödinger said, “The individual case [is] entirely devoid of interest.”22 Bayesians like Savage, though, could work with isolated one-time events, such as the probability that a chair weighs 20 pounds, that a plane would be late, or that the United States would be at war in five years.

  Bayesians could also combine information from different sources, treat observables as random variables, and assign probabilities to all of them, whether they formed a bell-shaped curve or some other shape. Bayesians used all their available data because each fact could change the answer by a small amount. Frequency-based statisticians threw up their hands when Savage inquired whimsically, “Does whiskey do more harm than good in the treatment of snake bite?” Bayesians grinned and retorted, “Whiskey probably does more harm than good.”23

  As a movement, Bayes was looking more akin to a philosophy—even a religion or a state of mind—than to a true-or-false scientific law like plate tectonics. According to David Spiegelhalter of Cambridge University, “It’s much more basic. . . . A huge sway of scientists says you can’t use probability to express your lack of knowledge or one-time events that don’t have any frequency to it. Probability came very late into civilization . . . [and many scientists find it] rather disturbing because it’s not a process of discovery. It’s more a process of interpretation.”24

  “Mathematical scientists often sense a combination of harmony and power in certain formulas,” explains Robert E. Kass, a Bayesian at Carnegie Mellon University. “There is at once a deep esthetic experience and a pragmatic recognition of profound consequences, leading to what Einstein called ‘the cosmic religious feeling.’ Bayes Theorem gives such a feeling. It says there is a simple and elegant way to combine current information with prior experience in order to state how much is known. It implies that sufficiently good data will bring previously disparate observers to agreement. It makes full use of available information, and it produces decisions having the least possible error rate. Bayes’ Theorem is awe-inspiring.” Unfortunately, Kass continued, “when people are captivated by its spell, they tend to proselytize and become blinded to its fundamental vulnerability. . . . [that] its magical powers depend on the validity of its probabilistic inputs.”25

  With zealots proselytizing Bayes as an all-encompassing panacea, the method inspired both religious devotion and dogmatic opposition. The battle between Bayesians and their equally fervent foes raged for decades and alienated many bystanders. As one onlooker reflected, “It was a huge food fight. It was devastating. They hated each other.”26 A prominent statistician lamented, “Ba
yesian statisticians do not stick closely enough to the pattern laid down by Bayes himself: if they would only do as he did and publish posthumously we should all be saved a lot of trouble.”27

  Savage became one of the believers. He developed into a full-blown, messianic Bayesian, “the most extreme advocate of a Bayesian . . . ever seen,” William Kruskal of the University of Chicago said. Savage recast the controversy over Bayes’ rule in its most extreme form as subjectivity versus objectivity. For him, as for Lindley, the rule was the one-and-only, winner-take-all method for reaching conclusions in the face of uncertainty. Bayes’ rule was right and rational, they felt, and other views were wrong, and it was neither necessary nor desirable to admit compromise.

  “Personal probability . . . became for [Savage] the only sensible approach to probability and statistics,” Kruskal recalled sadly. “If one were not in substantial agreement with him, one was inimical, or stupid, or at the least inattentive to an important scientific development. This attitude, no doubt sharpened by personal difficulties and by the mordant rhetoric of some anti-Bayesians, exacerbated relationships between Jimmie Savage and many old professional friends.”28

  Savage’s last year at Chicago, 1960, was fraught with turmoil. Although his department colleagues knew nothing about it, the administration was trying to abolish the statistics department, and Savage was fighting to get the decision reversed. His marriage was disintegrating and, hoping to save it, he moved to the University of Michigan. As he departed, he told his colleagues, “I proved the Bayesian argument in 1954. None of you have found a flaw in the proof and yet you still deny it. Why?”29 When he tried to return to Chicago, members of the department he had formed and chaired voted against rehiring him. At first, no other American or British university would offer him a position. In 1964 he moved to Yale University, remarried, and achieved some level of tranquility.

 

‹ Prev