The Theory That Would Not Die
Page 17
Newspaper reporting was meager until an improperly closed lock in a B-47’s bomb bay opened in 1958 and a “relatively harmless” bomb fell into Walter Gregg’s garden in Mars Bluff, South Carolina.5 The conventional explosives detonated on impact, digging a crater 30 feet deep and 50 to 70 feet across, destroying Gregg’s house, damaging nearby buildings, and killing several chickens. No people died, but news coverage was extensive; typically reporters said that a “TNT triggering device” had exploded. RAND noted disapprovingly that a Time magazine article was “astonishingly accurate.”6 Congress, Britain’s Labour Party, and Radio Moscow complained.
The air force paid the Greggs 54,000, and all B-47 and B-52 flights carrying nuclear weapons were suspended until new safety measures could be introduced. SAC also established a new policy: nuclear bombs were to be jettisoned on purpose only into oceans or designated “water masses. . . . Hence only uncontrolled drops may attract public attention in the future.”7
As the press became increasingly suspicious of the accidents, Iklé and another RAND researcher, Albert Wohlstetter, became concerned. Iklé recommended that the government remain silent about the presence of nuclear weapons in aircraft crashes. As Iklé and Madansky worked on their study, an accident occurred that could have provoked an international scandal. A wheel casting on a B-47 failed at the air force’s fueling base in Sidi Slimane, French Morocco. High explosives detonated and a fire raged for seven hours, destroying the nuclear weapon and capsule aboard.
Given all these accidents involving unarmed nuclear weapons, Madansky felt he could no longer assume, as SAC and frequentist statisticians had, that an accident involving an H-bomb could never occur. Instead, he decided, he needed “another theology, . . . another brand of inference,” where the possibility of an accident was not necessarily zero.
Frequentism was no help. “But, but, but,” Madansky said later, “if you’re willing to admit a shred of disbelief, you can let Bayes’ theorem work. . . . Bayes is the only other theology that you can go to. It’s just sort of natural for this particular problem. At least that’s how I felt back then.”8 As Dennis Lindley had argued, if someone attaches a prior probability of zero to the hypothesis that the moon is made of green cheese, “then whole armies of astronauts coming back bearing green cheese cannot convince him.” In that connection, Lindley liked to quote Cromwell’s Rule from the Puritan leader’s letter to the Church of Scotland in 1650: “I beseech you, in the bowels of Christ, think it possible you may be mistaken.”9 In the spirit of Cromwell’s Rule, Madansky adopted Bayes as his “alternate theology.”
Many Cold War statisticians knew it well. They were using it to deal with one of their biggest problems, estimating the reliability of the new intercontinental ballistic missiles. “We didn’t know how reliable the missiles were,” Madansky explained, “and we had a limited amount of test data to determine it, and so a number of the people working in reliability applied Bayesian methods. The entire field, from North American Rockwell, Thompson Ramo Wooldridge, Aerospace, and others, was involved. I’m sure Bayesian ideas floated around them too. We all knew about it. . . . It was just a natural thing to do.”10
Madansky immediately set out to measure how much credence could be given to the belief that no unauthorized detonations would occur in the future. He started with “statistically speaking, this simple commonsense idea based on the notion that there is an a priori distribution of the probability of an accident in a given opportunity, which is not all concentrated at zero.”11 The decision to incorporate a whisper of doubt into his prior was important. Once a Bayesian considered the possibility that even one small accident might have occurred in the past 10,000 opportunities, the probability of an accident-free future dropped significantly.
Politically and mathematically, Madansky faced an extremely difficult problem. As a young civilian challenging one of the military’s fundamental beliefs about the Cold War, he would have to convince decision makers that, although nothing catastrophic had occurred, something might in future. He would have to explain the Bayesian process to nonspecialists. And because the military was often suspicious of civilians making unwarranted suggestions, he would have to make as few initial assumptions as possible. In addition, there was the statistical problem involving the small number of accidents and, fortunately, no Armageddons.
Not wanting to stick his neck out, Madansky decided to make his prior odds considerably weaker than 50–50. “I tried to figure out how to avoid making any specifications about what the prior should be.”12 To refine his minimally informative prior, he added another common-sense notion: the probability of an accident-free future depends on the length of the accident-free past and the number of future accident opportunities. Madansky had no direct evidence because there had never been a nuclear accident. But the military had plenty of indirect data, and he began using it to modify his prior.
He knew that the military was already developing plans to greatly increase the number of flights carrying nuclear weapons. SAC was planning a system of 1,800 bombers capable of carrying nuclear weapons; approximately 15% of them would be in the air at all times, armed and ready for attack. At that time, SAC’s B-52 jet-powered Stratofortress bombers carried up to four nuclear bombs, each with an explosive power between 1 million and 24 million tons of TNT, or up to 1,850 Hiroshima bombs. The United States was also planning to outfit intercontinental ballistic missiles with hydrogen warheads, accelerate the production of intermediate range ballistic missiles, and negotiate with NATO countries for launching rights and military bases. The military would soon be dealing with shorter alarm times, increased alertness, and more decentralized weaponry, all factors that could increase the likelihood of a catastrophe.
Madansky calculated the number of “accident opportunities” based on the number of weapons, their longevity, and the number of times they were aboard planes or handled in storage.13 Accident opportunities corresponded to flipping coins and throwing dice. Counting them proved to be an important innovation.
“A probability that is very small for a single operation, say one in a million, can become significant if this operation will occur 10,000 times in the next five years,” Madansky wrote.14 The military’s own evidence indicated that “a certain number of aircraft crashes” was inevitable. According to the air force, a B-52 jet, the plane carrying SAC’s bombs, would average 5 major accidents per 100,000 flying hours. Roughly 3 nuclear bombs were dropped accidentally or jettisoned on purpose per 1,000 flights that carried these weapons. In that 80% of aircraft crashes occurred within 3 miles of an air force base, the likelihood of public exposure was growing. And so it went. None of these studies involved a nuclear explosion, but to a Bayesian they suggested ominous possibilities.
Computationally, Madansky was confident that RAND’s two powerful computers, a 700 series IBM and the Johnniac, designed by and named for John von Neumann, could handle the job. But he hoped to avoid using them by solving the problem with pencil and paper.
Given the power and availability of computers in the 1950s, many Bayesians were searching for ways to make calculations manageable. Madansky latched onto the fact that many types of priors and posteriors share the same probability curves. Bailey had used the same technique in the late 1940s, and later it would be known as Howard Raiffa and Robert Schlaifer’s conjugate priors. When Madansky read their book describing them, he was pleased to know that his prior had a name and rationale: “I was just doing it ad hoc.”15
Using his tractable prior, classified military data, and informed guesswork, Madansky came to a startling conclusion. It was highly probable that SAC’s expanded airborne alert system would be accompanied by 19 “conspicuous” weapon accidents each year.
Madansky wrote an elementary summary that high-level military decision makers could understand and inserted it into RAND’s final report, “On the Risk of an Accidental or Unauthorized Nuclear Detonation. RM-2251 U.S. Air Force Project Rand,” dated October 15, 1958. The report listed Iklé
as its primary author in collaboration with Madansky and a psychiatrist, Gerald J. Aronson. Until then, many RAND reports had been freely published, but air force censors were clamping down on its think tank, and this report was classified, its initial readership limited to a select few. It was finally released more than 41 years later, on May 9, 2000, with numerous passages blacked out.
In view of what would happen a few years later in Spain, much of the report seems prescient. Madansky could not predict when or where the accident would occur, but he was sure of two things. The likelihood of an accident was rising, and it was in the military’s interest to make its nuclear arsenal safer. Given the media’s increasingly savvy coverage of accidents involving nuclear weaponry, Iklé foresaw Soviet propaganda, citizens’ campaigns to limit the use of nuclear devices, and foreign powers demanding an end to American bases on their soil. Who knew, but the British Labour Party might even win an election or NATO might crumble?
As a result, Iklé and Madansky advocated safety features that included requiring at least two people to arm a nuclear weapon; electrifying arming switches to jolt anyone who touched them; arming weapons only over enemy territory; installing combination locks inside warheads; preventing the release of radioactive material in accidental fires of high-energy missile fuels; and switching the nuclear matter inside weapons from plutonium to uranium because the latter would contaminate a smaller area. The report also recommended placing reassuring articles in scientific journals to publicize research indicating that plutonium emissions were less dangerous to humans than had been thought; the source of the research was to be disguised.
Later, after SAC implemented its airborne alert program and began keeping significant numbers of nuclear-armed planes aloft at all times, Iklé and Madansky followed up with a more pointed and less mathematical summary about accident rates. As of 2010 this internal document, meant only for internal circulation, was still classified. In an appendix to the first report, Iklé and Aronson, the psychiatrist who consulted at RAND, tackled the topic of mental illness among military personnel in charge of nuclear bombs. Such concerns were widespread at the time. Aronson believed that men who worked near the bombs should be psychologically tested by being confined in a chamber by themselves, deprived of sleep and sensory stimuli for several hours, and perhaps dosed with hallucinogens such as LSD. He predicted “only between one-third and one-quarter of ‘normal’ volunteer-subjects would be able to withstand [it] for more than several hours.”16 It was later learned that the Central Intelligence Agency financed a variety of LSD (lysergic acid diethylamide) experiments on people without their knowledge or consent during the 1950s, although the practice violated ethical standards even then.
After the report was issued, Iklé, his knees shaking, went to brief “a sizable audience of Air Force generals.”17 RAND researchers assumed that General LeMay would scorn their conclusions. LeMay had directed the fire-bombing of Japanese cities during the Second World War, his ghostwritten autobiography would propose bombing the Vietnamese “back into the Stone Age” in the mid-1960s, and he was the model for “Buck” Turgidson, the insanely warlike, cigar-chomping general played by George C. Scott in the film Dr. Strangelove. Iklé described LeMay’s attitude about nuclear weapons as one of “injudicious pugnacity.”18
But the general surprised him. The day after RAND presented its report in Washington, LeMay asked for a copy. Iklé said LeMay later issued a “blizzard” of orders, among others calling for the two-man rule and coded locks. The army and navy followed suit. Iklé put LeMay’s response “in the ‘success’ column of my life’s ledger.”19
According to most reports, however, few of the coded locks were actually installed on nuclear weapons until John F. Kennedy became president. Four days after Kennedy’s inauguration, a SAC B-52 disintegrated in midflight. One of its two 24-megaton hydrogen bombs smashed into a swamp near Goldsboro, North Carolina, and a large chunk of enriched uranium sank more than 50 feet, where it presumably remains to this day. Analysis showed that only one of the bomb’s six safety devices had functioned properly. JFK was told of scores of nuclear weapon accidents—according to Newsweek, more than 60 since the Second World War. From then on, the Kennedy administration vigorously pursued nuclear weapon safety and added coded locks to nuclear weapons.
Iklé went on to become a leading hard-line specialist in military and foreign policy and would be awarded two Distinguished Public Service Medals, the Department of Defense’s highest civilian award. Madansky became a professor at the University of Chicago, where he developed a reputation as a neutral pragmatist in the battles between Bayesians and frequentists. RAND gradually weaned itself from air force funding by diversifying into social welfare research.
The world can be thankful that Madansky’s Bayesian statistics forced the military to tighten safety measures. A number of false alerts suggestive of Soviet nuclear attacks were identified correctly before SAC could launch a counterattack. Phenomena causing false alerts included the aurora borealis, a rising moon, space debris, false U.S. radar signals, errors in the use of computers (such as a mistaken Pentagon warning of incoming Soviet missiles in 1980), routine Soviet maintenance procedures after the Chernobyl accident, a Norwegian weather research missile, and “more hidden problems of unauthorized acts.”
10.
46,656 varieties
In sharp contrast to the super secrecy of Madansky’s H-bomb report, the schism between entrenched frequentists and upstart Bayesians was getting downright noisy. As usual, the bone of contention was the subjectivity of Thomas Bayes’ pesky prior. The idea of importing knowledge that did not originate in the statistical data at hand was anathema to the anti-Bayesian duo Fisher and Neyman. Since they were making conclusions and predictions about data without using prior odds, Bayesian theoreticians on the defensive struggled to avoid priors altogether.
Bayesian theories mushroomed in glorious profusion during the 1960s, and Jack Good claimed he counted “at least 46656 different interpretations,” far more than the world had statisticians.1 Versions included subjective, personalist, objective, empirical Bayes (EB for short), semi-EB, semi-Bayes, epistemic, intuitionist, logical, fuzzy, hierarchical, pseudo, quasi, compound, parametric, nonparametric, hyperparametric, and non-hyperparametric Bayes. Many of these varieties attracted only their own creators, and some modern statisticians contend that hairsplitting produced little pathbreaking Bayesian theory. When asked how to differentiate one Bayesian from another, a biostatistician cracked, “Ye shall know them by their posteriors.”
Almost unnoticed in the hoopla, the old term “inverse probability” was disappearing, and a modern term, “Bayesian inference,” was taking its place. As the English language swept postwar statistical circles, articles by British theorists began to look more important than Laplace’s French ones. “Much that’s been written about the history of probability has been distorted by this English-centric point of view,” says Glenn Shafer of Rutgers University.2 More than language may have been involved. In 2008, when the Englishman Dennis Lindley was 85 years old, he said he was now almost convinced that Laplace was more important than Thomas Bayes: “Bayes solved one narrow problem; Laplace solved many, even in probability. . . . My ignorance of the Frenchman’s work may be cultural, since he did not figure prominently in my mathematical education.” Then he added with characteristic honesty, “But I am biased: the French let us down during WW2 and then there was the ghastly de Gaulle.”3
In England the ferment over Bayes’ rule extended even into Fisher’s family. His son-in-law George E. P. Box was the young chemist who had to say he was transporting a horse in order to consult Fisher during the Second World War. Like Fisher, Box came to believe that statistics should be more closely entwined with science than with mathematics. This view was reinforced by Box’s later work for the chemical giant ICI in Britain and for the quality control movement with W. Edwards Deming and the Japanese auto industry.
When Box organized a statistics depa
rtment at the University of Wisconsin in 1960, he taught for the first time, a class called Foundations of Statistics. “Week after week,” he said, “I prepared my notes very carefully. But the more I did, the more I became convinced that the standard stuff I’d studied under Egon Pearson was wrong. So gradually my course became more and more Bayesian. . . . People used to make fun of it and say it was all nonsense.”4
Helping scientists with scanty data, Box found that traditional statistics produced messy, unsatisfactory solutions. Still, frequentism worked well for special cases where data fell into bell-shaped probability curves and middle-values were assumed to be averages. So, as Box said, “Comparing averages looked right to me until Stein.”5
Stein’s Paradox called those averages into question. Charles Stein, a theoretical statistician, had been thinking about something that looked quite simple: estimating a mean. Statisticians do not concern themselves with individuals; their bread and butter is a midvalue summarizing large amounts of information. The centuries-old question was which midvalue works best for a particular problem. In the course of his investigation, Stein discovered a method that, ironically, produced more accurate predictions than simple averages did. Statisticians called it Stein’s Paradox. Stein called it shrinkage. As a frequency-based theorist, he studiously avoided discussing its relationship with Bayes.