The Theory That Would Not Die
Page 15
In 1971, at the age of 53, Savage died suddenly of a heart attack. His death at midcareer deprived American Bayesians of their leading spokesman. The New Haven Register had another perspective. Savage had cowritten a book called How to Gamble If You Must. For Bayesians, all assumptions about the future were risky, and gambling was the paradigm of decision making. The Register headlined his obituary, “Yale Statistician Leonard Savage Dies; Authored Book on Gambling.”
In the meantime, Lindley had moved back to Britain, where for many years he was the only Bayesian in a position of authority. In time he built not just Bayesian theory but also strong Bayesian research groups, first at the University College of Wales in Aberystwyth and then at University College London. The latter had England’s most important statistical department and was a temple of frequentism. When Lindley arrived, a colleague said it was “as though a Jehovah’s Witness had been elected Pope.”30 Lindley complained that he “inherited” several statisticians who “would not change their view of statistics.”31 He said, “The general attitude [was] to turn their heads the other way.”32
In an era when many sneered at Bayes, it took courage to create Europe’s leading Bayesian department. Often the only Bayesian at meetings of the Royal Statistical Society and certainly the only combative one, Lindley defended Bayes’ rule like a fearless terrier or a devil’s advocate. In return, he was tolerated almost as comic relief. “Bayesian statistics is not a branch of statistics,” he argued. “It is a way of looking at the whole of statistics.”
Lindley became known as a modern-age revolutionary. He fought to get Bayesians appointed, professorship by professorship, until the United Kingdom had a core of ten Bayesian departments. Eventually, Britain became more sympathetic to the method than the United States, where Neyman maintained Berkeley as an anti-Bayesian bunker. Still, the process left scars: despite Lindley’s landmark contributions he was never named a Fellow of the Royal Society. In 1977, at the age of 54, Lindley forsook the administrative chores he hated and retired early. He celebrated his freedom by growing a beard and becoming what he called “an itinerant scholar” for Bayes’ rule.33
Thanks to Lindley in Britain and Savage in the United States, Bayesian theory came of age in the 1960s. The philosophical rationale for using Bayesian methods had been largely settled. It was becoming the only mathematics of uncertainty with an explicit, powerful, and secure foundation in logic. How to apply it, though, remained a controversial question.
Lindley’s enormous influence as a teacher and organizer bore fruit in the generation to come, while Savage’s book spread Bayesian methods to the military and to business, history, game theory, psychology, and beyond. Although Savage wrote about rabbit ears and neon light in beer, he personally encouraged researchers who would apply Bayes’ rule to life-and-death problems.
8.
jerome cornfield, lung cancer, and heart attacks
Bayes came to medical research through the efforts of a single scientist, Jerome Cornfield, whose only degree was a B.A. in history and who relied on the rule to identify the causes of lung cancer and heart attacks.
Lung cancer, extremely rare before 1900 and still uncommon in 1930, sprang up as if out of nowhere shortly after the Second World War. By 1952 it was killing 321 people per million per year in England and Wales. A year later approximately 30,000 new cases were diagnosed in the United States. No other form of cancer showed such a catastrophic leap. Studies in Europe, Turkey, and Japan confirmed the puzzling plague. There seemed to be something special about the disease.
But what could it be? Its cause was unknown. Pathologists thought the increase in lung cancer might be due to improvements in diagnostic methods or to the natural aging of the population. Others blamed exhaust from factories or the growing number of automobiles, tar particles from modern asphalt pavements, or England’s infamous smog from homes heated with open coal-burning fires. Cigarettes, mass-produced since the invention of a cigarette-making machine in 1880, had been patriotically shipped to soldiers during the First World War. Animal studies, though, had failed to demonstrate that tobacco tar was carcinogenic.
As early as 1937 a small-scale study in Germany had pointed ever so tentatively to cigarette smoke. But there were doubts about that too. While 80% of middle-aged men in England and Wales smoked cigarettes, tobacco consumption per capita had dropped slightly. And fumes from cigarettes, which had replaced cigars and pipes, did not seem worse than other smoke.
The world’s most famous biostatistician, Austin Bradford “Tony” Hill, was intrigued. He called himself an arithmetician rather than a mathematician or statistician and, in a series of articles in The Lancet, used straightforward logic to persuade the medical community to objectively quantify its research findings. During the late 1940s, two decades after Ronald Fisher had introduced randomization to agricultural experimentation, Hill introduced randomization to medical research. Inaugurating the modern controlled clinical trial, Hill showed that pertussis vaccine reduced children’s whooping cough cases by 78% and that streptomycin was effective against pulmonary tuberculosis. Bradford Hill became so famous that a letter addressed to “Lord Hill, Bradford, England” reached him.
To identify the most probable causes of the catastrophic increase in lung cancer, Hill and a young physician and epidemiologist, Richard Doll, organized interviews with patients with and without lung cancer in 20 London-area hospitals. All were questioned about their past activities and exposures. The results, published in 1950, were shockingly clear. Of 649 men with lung cancer, only 2 were nonsmokers; a high proportion of the lung cancer patients were heavy cigarette smokers, and their death rate was 20 times higher than that of nonsmokers. A large American study by Ernst L. Wynder and Evarts A. Graham confirmed the British result the same year.
The startling news that cigarettes and lung cancer were linked caused an instant international uproar. Newspapers, radio, television, and magazines competed with medical journals for the latest scoop. With the exception of the influenza epidemic of 1918, no disease had ever sprung as fast from obscurity to worldwide consciousness. Few have engendered such enormous controversy.
The Hill and Doll study remains one of the crowning glories of medical statistics. It was the first sophisticated case-control study of any noninfectious disease. And it persuaded Hill and Doll to quit smoking. Despite its dramatic results, their study did not show that smoking cigarettes actually causes lung cancer. No one could say that for sure. Jerome Cornfield, an American government bureaucrat at the National Institutes of Health (NIH), took up the challenge. And with Hill organizing clinical studies in Britain and Cornfield developing their mathematical defense in the United States, the two tackled complementary aspects of the same problem from different sides of the Atlantic.
The two men had totally dissimilar backgrounds. Hill’s father was a physician with a knighthood, and one of his ancestors had invented the postage stamp. Cornfield was the son of Russian Jewish immigrants and had earned a bachelor’s degree from New York University in 1933. The federal government, desperate for economic data during the Depression, had hired “bright guys” to replace the clerks who had traditionally compiled statistics on unemployment, national income, housing, agriculture, and industry.1 Cornfield qualified as a bright guy, so he signed on as a government statistician for 26.31 a week, 1,368 a year.
Washington, D.C., was still a segregated, southern city. “The rule of thumb was that, if you were Jewish, you could work for the Department of Labor and, if you were Catholic, you could work for the Department of Commerce,” explained Marvin Hoffenberg, a friend of Cornfield’s and later a UCLA professor.2 So Cornfield went to Labor. The U.S. Department of Agriculture ran a so-called Graduate School where mathematically inclined government employees could study statistics, and Cornfield took his only mathematics and statistics courses there.
As Cornfield recalled, “Nobody knew how many unemployed there were, and sampling seemed the way to find out. . . . Statistics had me h
ooked.”3 Although both Fisher and Neyman lectured on sampling methods at the Graduate School, its director, W. Edwards Deming, was open-minded; he published Thomas Bayes’ essay with an introduction by Edward Molina of Bell Laboratories.
Friends referred to Cornfield’s tenure at the Department of Labor as his serious and exotic phase. He played a major role in revising the Consumer Price Index and in creating one for occupied Japan after the Second World War. But he was “a different kind of a guy,” a friend recalled.4 Unable to think of any good reason for shaving, he grew a little pointed beard, and with his long gaunt frame and an umbrella over his arm he resembled an elegant diplomat strolling jauntily to work. At a time when few others would, he shared his office with a woman statistician and an African American statistical clerk. Next to his mechanical Marchant desktop calculator he installed a Turkish water pipe and could be seen puffing nonchalantly from its two-foot tube.
Cornfield moved to the federal government’s new NIH in 1947. Because infectious diseases were in decline in the United States, NIH epidemiologists were attacking chronic diseases, particularly cancer, heart attacks, and diabetes. To assist them, NIH hired a few people with strong quantitative backgrounds. Only one of them had so much as a master’s degree. Biostatistics was a professional backwater, and throughout the 1950s and 1960s NIH employed only ten or 20 statisticians at any one time. It was this small band that introduced statistical methods to NIH researchers in biology and medicine.
By 1950 most men in the United States smoked, and smoking rates were increasing, especially among women. The favorite brands were unfiltered Camels, Lucky Strikes, Chesterfields, and Philip Morris. When Lorillard Tobacco Company introduced filtered Kents in 1952, the filters contained asbestos, which was not removed until 1957. When 14 studies conducted in five countries showed that lung cancer patients included an alarming percentage of heavy smokers, both Cornfield and his wife quit their 2½-pack-a-day habits.
Cornfield realized that the Hill and Wynder studies did not directly answer the questions physicians and their frightened patients were asking: what is my risk? The studies showed the percentages of smokers among groups of people with and without lung cancer, but they did not say what proportion of smokers and nonsmokers was likely to develop lung cancer.
The surest and most direct way to answer the public’s fears was to follow large groups of smokers and nonsmokers for years, prospectively, to see how many of each group developed lung cancer. Unfortunately, studies about the future of large populations require a great deal of money and time, especially for relatively rare problems like lung cancer. That is why Hill and Doll had organized their study as a retrospective one, choosing people who already had lung cancer and asking them about their health histories. Such studies are a relatively quick, cheap way to identify potential causes of a particular disease. As a statistician, however, Cornfield suspected that retrospective studies like Hill and Doll’s could also be used to answer the individual’s haunting question, What’s the chance that I or my loved ones will get this fatal disease?
In 1951 Cornfield used Bayes’ rule to help answer the puzzle. As his prior hypothesis he used the incidence of lung cancer in the general population. Then he combined that with NIH’s latest information on the prevalence of smoking among patients with and without lung cancer. Bayes’ rule provided a firm theoretical link, a bridge, if you will, between the risk of disease in the population at large and the risk of disease in a subgroup, in this case smokers. Cornfield was using Bayes as a philosophy-free mathematical statement, as a step in calculations that would yield useful results. He had not yet embraced Bayes as an all-encompassing philosophy.
Cornfield’s paper stunned research epidemiologists. More than anything else, it helped advance the hypothesis that cigarette smoking was a cause of lung cancer. Out of necessity, but without any theoretical justification, epidemiologists had been using case studies of patients to point to possible causes of problems. Cornfield’s paper showed clearly that under certain conditions (that is, when subjects in a study were carefully matched with controls) patients’ histories could indeed help measure the strength of the link between a disease and its possible cause. Epidemiologists could estimate disease risk rates by analyzing nonexperimental clinical data gleaned from patient histories. By validating research findings arising from case-control studies, Cornfield made much of modern epidemiology possible. In 1961, for example, case-control studies would help identify the antinausea drug thalidomide as the cause of serious birth defects.
Two massive efforts in England and the United States during the mid-1950s confirmed Cornfield’s judgment. Because many people had rejected the findings of their retrospective study, Hill and Doll had decided to take a direct approach and conduct a prospective study. They questioned 40,000 British physicians about their current smoking habits and then followed them for five years to see who got lung cancer. In a parallel U.S. study, E. Cuyler Hammond and Daniel Horn followed 187,783 men aged 50 to 69 in New York State for more than 3½ years. Death rates in both countries were similar: heavy smokers were 22 to 24 times more likely to get lung cancer than nonsmokers and, in another surprise discovery, were 42% and 57% more likely to get, respectively, heart and circulatory diseases. Research also showed that cigarettes were more dangerous than pipes, although the risk declined after smoking stopped.
Surprisingly, neither Fisher nor Neyman could accept research results showing that cigarettes caused lung cancer. Both anti-Bayesians were heavy smokers, and Fisher was a paid consultant to the tobacco industry. But more important, neither found epidemiologic studies convincing. And both were correct in pointing out that tobacco could be associated with cancer without causing it. In 1955 they launched a vigorous counterattack, arguing that only experimental data from strictly controlled laboratory and field experiments could predict future disease rates. The most eminent American medical statistician of the day, Joseph Berkson, of the Mayo Clinic in Rochester, Minnesota, joined the attack; Berkson did not believe cigarettes could cause both cancer and heart disease.
Fisher kept up a barrage of angry attacks, including a book and two articles published in highly prestigious journals, Nature and the British Medical Journal. According to Doll, Fisher even went so far as to accuse Hill of scientific dishonesty. Over the course of three years Fisher developed two remarkable hypotheses. The first, believe it or not, was that lung cancer might cause smoking. The second was that a latent genetic factor might give some people hereditary predilections for both smoking and lung cancer. In neither case would smoking cause lung cancer.
Cornfield maintained a running argument with Fisher through the 1950s. Cornfield was already thinking deeply about the standards of evidence needed before observational data could establish cause and effect. Finally, in 1959, he raked Fisher over the coals about smoking with a common-sense, nonmathematical paper that reads like a legal brief. In that seminal paper he and five coauthors systematically addressed every one of Fisher’s alternative explanations for the link between cigarette smoking and lung cancer. They hurled one counterargument after another at Fisher’s hypothetical genetic factor. If cigarette smokers were nine times more likely than nonsmokers to get lung cancer, Fisher’s latent genetic factor must be even larger—though nothing approaching that had ever been seen.
Cornfield dismissed out of hand Fisher’s suggestion that cancer might cause smoking: “Since we know of no evidence to support the view that the bronchogenic carcinoma diagnosed after age 50 began before age 18, the median age at which smokers begin smoking, we shall not discuss it further.”5 Cornfield pointed out that Fisher’s genetic factor would have to spread rapidly and occur more among cigarette smokers than nonsmokers; cause tumors on mouse skin but not on human lungs; weaken with age after a smoker quit; and be more likely in men than women, 60 times more prevalent among two-pack-a-day smokers, and different in pipe and cigar smokers. Yet none of these phenomena had ever been observed.
Fisher wound up looking ridi
culous. As Cornfield coolly noted, “A point is reached . . . when a continuously modified hypothesis becomes difficult to entertain seriously.”6 Scientists who can find only one viable explanation for associations in their data have probably found its causal agent. The existence of possible alternative explanations indicates that the cause has probably not yet been found. Cornfield was laying out the road map for future smoking and lung cancer research.
By now, Cornfield the history major had become the most influential biomedical statistician in the United States. When the U.S. surgeon general concluded in 1964 that “cigarette smoking is causally related to lung cancer in men,” he cited Cornfield’s work.7 Nonexperimental studies had helped identify an association between smoking and lung cancer. With the help of Bayes’ rule—what Laplace had called “the probability of causes and future events, derived from past events”—Cornfield provided the theoretical justification for using case-control studies to estimate the strength of links between exposure and disease. Today, thanks to Cornfield, case-control studies are the primary tool epidemiologists use to identify likely causes of chronic diseases.
Over his career, Cornfield would become involved in every major public health problem of the day. Most of them, including smoking, the safety of polio vaccines, and the efficacy of diabetes treatments, were fiercely controversial.
To calm the statistics phobia of physicians and epidemiologists, Cornfield developed an easygoing bedside manner. Abandoning his serious phase, he cultivated an infectious laugh and an irrepressible air of informality. By mixing humor into conversations, telling stories, and laughing heartily he inspired tremendous confidence. Even his gait and prose became sprightly. Soon every biomedical scientist with a committee and a controversy wanted Cornfield on board. By pointing out common elements that everyone shared, he could unify the most disparate group. After one particularly onerous series of meetings and reports, a committee member asked him, “Did you get my last letter about sample size?” There was a pause, and Cornfield grinned and said, “Christ, I hope so.” When the committee finally produced its massive procedural manual, Cornfield waved it over his head, declaring, “You know, say what you will about the Ten Commandments, you must come back to the pleasant fact that there were only ten of them.”