The Theory That Would Not Die
Page 6
The realization that Laplace was one of the world’s first modern professional scientists emerged slowly. The statistician Karl Pearson, no shrinking violet, called the author of the Britannica article “one of the most superficial writers that ever obscured the history of science. . . . Such statements published by a writer of one nation about one of the most distinguished men of a second nation, and wholly unsubstantiated by references, are in every way deplorable.”5 Modern historians have shown that many of the disparaging comments about Laplace’s life and work were false.
Personal insults aside, Laplace launched a craze for statistics that would ultimately inundate both Bayes’ original rule and Laplace’s own version of it. He did so by publicizing in 1827 the then-extraordinary fact that the number of dead letters in the Parisian postal system remained roughly constant from year to year. After the French government published a landmark series of statistics about the Paris region, it appeared that many irrational and godless criminal activities, including thefts, murders, and suicides, were also constants. By 1830 stable statistical ratios were firmly dissociated from divine providence, and Europe was swept by a veritable mania for the objective numbers needed by good government.
Unsettled by rapid urbanization, industrialization, and the rise of a market economy, early Victorians formed private statistical societies to study filth, criminality, and numbers. The chest sizes of Scottish soldiers, the number of Prussian officers killed by kicking horses, the incidence of cholera victims—statistics were easy to collect. Even women could do it. No mathematical analysis was necessary or expected. That most of the government bureaucrats collecting statistics were ignorant of and even hostile to mathematics did not matter. Facts, pure facts, were the order of the day.
Gone was the idea that we can use probability to quantify our lack of knowledge. Gone was the search for causes conducted by Bayes, Price, and Laplace. A correspondent admonished the hospital reformer Florence Nightingale in 1861, “Again I must repeat my objections to intermingling Causation with Statistics. . . . The statistician has nothing to do with causation.”6
“Subjective” also became a naughty word. The French Revolution and its aftermath shattered the idea that all rational people share the same beliefs. The Western world split between Romantics, who rejected science outright, and those who sought certainty in natural science and were enthralled by the objectivity of numbers, whether the number of knifings or of marriages at a particular age.
During the decade after Laplace’s death, four European revisionists led the charge against Laplace and probability, the mathematics of uncertainty. John Stuart Mill denounced probability as “an aberration of the intellect” and “ignorance . . . coined into science.”7 Objectivity became a virtue, subjectivity an insult, and the probability of causes a target of skepticism, if not hostility. Awash in newly collected data, the revisionists preferred to judge the probability of an event according to how frequently it occurred among many observations. Eventually, adherents of this frequency-based probability became known as frequentists or sampling theorists.
To frequentists Laplace was such a towering target that Thomas Bayes’ existence barely registered. When critics thought of Bayes’ rule, they thought of it as Laplace’s rule and focused their criticism on him and his followers. Arguing that probabilities should be measured by objective frequencies of events rather than by subjective degrees of belief, they treated the two approaches as opposites, although Laplace had considered them basically equivalent.
The reformers denounced Laplace’s pragmatic simplifications as gross abuses. Two of his most popular applications of probability were condemned wholesale. Laplace had asked: given that the sun has risen thousands of times in the past, will it rise tomorrow? and given that the planets revolve in similar ways around the sun, is there a single cause of the solar system? He did not actually use Bayes’ rule for either project, only simple gambling odds. Sometimes, though, he and his followers began answering these questions by assuming 50–50 odds. The simplification would have been defensible had Laplace known nothing about the heavens. But he was the world’s leading mathematical astronomer, and he understood better than anyone that sunrises and nebulae were the result of celestial mechanics, not gambling odds. He had also started his study of male and female birthrates with 50–50 odds, although scientists already knew that the likelihood of a male birth is approximately 0.52.
Laplace agreed that reducing scientific questions to chance boosted the odds in favor of his deep conviction that physical phenomena have natural causes rather than religious ones. He warned his readers about it. His followers also sometimes weighted their initial odds heavily in favor of natural laws and weakened counterexamples. Critics pounded away at the fact that chance was irrelevant to the questions at hand. They identified Bayes’ rule with equal priors and damned the entire rule because of them. Few of the critics tried to even imagine other kinds of priors.
Years later John Maynard Keynes studied the complaints made about Laplace’s assessment, based on 5,000 years of history, that “it is a bet of 1,825,214 to 1 that [the sun] will rise tomorrow.” Summarizing the arguments, Keynes wrote that Laplace’s reasoning “has been rejected by [George] Boole on the ground that the hypotheses on which it is based are arbitrary, by [John] Venn on the ground that it does not accord with experience, by [Joseph] Bertrand because it is ridiculous, and doubtless by others also. But it has been very widely accepted—by [Augustus] de Morgan, by [William] Jevons, by [Rudolf] Lotze, by [Emanuel] Czuber, and by Professor [Karl] Pearson—to name some representative writers of successive schools and periods.”8
Amid the dissension, Laplace’s delicate balance between subjective beliefs and objective frequencies collapsed. He had developed two theories of probability and shown that when large numbers are involved they lead to more or less the same results. But if natural science was the route to certain knowledge, how could it be subjective? Soon scientists were treating the two approaches as diametric opposites. Lacking a definitive experiment to decide the controversy and with Laplace demonstrating that both methods often lead to roughly the same result, the tiny world of probability experts would be hard put to settle the argument.
Research into probability mathematics petered out. Within two generations of his death Laplace was remembered largely for astronomy. By 1850 not a single copy of his massive treatise on probability was available in Parisian bookstores. The physicist James Clerk Maxwell learned about probability from Adolphe Quetelet, a Belgian factoid hunter, not from Laplace, and adopted frequency-based methods for statistical mechanics and the kinetic theory of gases. Laplace and Condorcet had expected social scientists to be the biggest users of Bayes’ rule, but they were reluctant to adopt any form of probability. An American scientist and philosopher Charles Sanders Peirce promoted frequency-based probability during the late 1870s and early 1880s. In 1891 a Scottish mathematician, George Chrystal, composed an obituary for Laplace’s method: “The laws of . . . Inverse Probability being dead, they should be decently buried out of sight, and not embalmed in text-books and examination papers. . . . The indiscretions of great men should be quietly allowed to be forgotten.”9
For the third time Bayes’ rule was left for dead. The first time, Bayes himself had shelved it. The second time, Price revived it briefly before it again died of neglect. This time theoreticians buried it.
The funeral was a trifle premature. Despite Chrystal’s condemnation, Bayes’ rule was still taught in textbooks and classrooms and used by astronomers because the anti-Bayesian frequentists had not yet produced a systematic, practical substitute. In scattered niches far from the eyes of disapproving theoreticians, Bayes bubbled along, helping real-life practitioners assess evidence, combine every possible form of information, and cope with the gaps and uncertainties in their knowledge.
Into this breach between theoretical disapproval and practical utility marched the French army, under the baton of a politically powerful mathematician n
amed Joseph Louis François Bertrand. Bertrand reformed Bayes for artillery field officers who dealt with a host of uncertainties: the enemy’s precise location; air density; wind direction; variations among their hand-forged cannons; and the range, direction, and initial speed of projectiles. In his widely used textbooks Bertrand preached that Laplace’s probability of causes was the only valid method for verifying a hypothesis with new observations. He believed, however, that Laplace’s followers had lost their way and must stop their practice of indiscriminately using 50–50 odds for prior causes. To illustrate, he told about the foolish peasants of Britanny who, looking for the possible causes of shipwrecks along their rocky coast, assigned equal odds to the tides and to the far more dangerous northwest winds. Bertrand argued that equal prior odds should be confined to those rare cases when hypotheses really and truly were equally likely or when absolutely nothing was known about their likelihoods.
Following Bertrand’s strict standards, artillery officers began assigning equal probabilities only for cannons made in the same factory by roughly the same staff using identical ingredients and processes under identical conditions. For the next 60 years, between the 1880s and the Second World War, French and Russian artillery officers fired their weapons according to Bertrand’s textbook.
Bertrand’s strict Bayesian reforms figured in the Dreyfus affair, the scandal that rocked France between 1894 and 1906. Alfred Dreyfus, a French Jew and army officer, was falsely convicted of spying for Germany and condemned to life imprisonment. Almost the only evidence against Dreyfus was a letter he was accused of having sold to a German military attaché. Alphonse Bertillon, a police criminologist who had invented an identification system based on body measurements, testified repeatedly that, according to probability mathematics, Dreyfus had most assuredly written the incriminating letter. Bertillon’s notions of probability were mathematical gibberish, and he developed ever more fantastical arguments. As conservative antirepublicans, Roman Catholics, and anti-Semites supported Dreyfus’s conviction, a campaign to exonerate him was organized by his family, anticlericals, Jews, and left-wing politicians and intellectuals led by the novelist Émile Zola.
At Dreyfus’s military trial in 1899, his lawyer called on France’s most illustrious mathematician and physicist, Henri Poincaré, who had taught probability at the Sorbonne for more than ten years. Poincaré believed in frequency-based statistics. But when asked whether Bertillon’s document was written by Dreyfus or someone else, he invoked Bayes’ rule. Poincaré considered it the only sensible way for a court of law to update a prior hypothesis with new evidence, and he regarded the forgery as a typical problem in Bayesian hypothesis testing.
Poincaré provided Dreyfus’s lawyer with a short, sarcastic letter, which the lawyer read aloud to the courtroom: Bertillon’s “most comprehensible point [is] false. . . . This colossal error renders suspect all that follows. . . . I do not understand why you are worried. I do not know if the accused will be condemned, but if he is, it will be on the basis of other proofs. Such arguments cannot impress unbiased men who have received a solid scientific education.”10 At that point, according to the court stenographer, the courtroom erupted in a “prolonged uproar.” Poincaré’s testimony devastated the prosecution; all the judges had attended military schools and studied Bayes in Bertrand’s textbook.
The judges issued a compromise verdict, again finding Dreyfus guilty but reducing his sentence to five years. The public was outraged, however, and the president of the Republic issued a pardon two weeks later. Dreyfus was promoted and awarded the Legion of Honor, and government reforms were instituted to strictly separate church and state. Many American lawyers, unaware that probability helped to free Dreyfus, have considered his trial an example of mathematics run amok and a reason to limit the use of probability in criminal cases.
As the First World War approached, a French general and proponent of military aviation and tanks, Jean Baptiste Eugène Estienne, developed elaborate Bayesian tables telling field officers how to aim and fire. Estienne also developed a Bayesian method for testing ammunition. After Germany captured France’s industrial base in 1914, ammunition was so scarce that the French could not use wasteful frequency-based methods to test its quality. Mobilized for the national defense, professors of abstract mathematics developed Bayesian testing tables that required destroying only 20 cartridges in each lot of 20,000. Instead of conducting a predetermined number of tests, the army could stop when they were sure about the lot as a whole. During the Second World War, American and British mathematicians discovered similar methods and called them operations research.
Bayes’ rule was supposedly still dying on the vine as the First World War approached and the United States faced two emergencies caused by the country’s rapid industrialization. In each case self-taught statisticians resorted to Bayes as a tool for making informed decisions, first about telephone communications and second about injured workers.
The first crisis occurred when the financial panic of 1907 threatened the survival of the Bell telephone system owned by American Telephone and Telegraph Company. Alexander Graham Bell’s patents had expired a few years earlier, and the company had overexpanded. Only the intervention of a banking consortium led by the House of Morgan prevented Bell’s collapse.
At the same time, state regulators were demanding proof of Bell’s ability to provide better and cheaper service than local competitors. Unfortunately, Bell telephone circuits were often overloaded in the late morning and early afternoon, when too many customers tried to place calls at the same time. During the rest of the day—80% of the time—Bell’s facilities were under-utilized. No company could afford to build a system to handle every call that could conceivably be made at peak times.
Edward C. Molina, an engineer in New York City, considered the uncertainties involved. Molina, whose family had emigrated from Portugal via France, was born in New York in 1877. He graduated from a city high school but, with no money for college, got a job first at Western Electric Company and then at AT&T’s engineering and research department (later called Bell Laboratories). The Bell System of phone companies was adopting a new mathematical approach to problem solving. Molina’s boss, George Ashley Campbell, had studied probability with Poincaré in France but other employees were learning it from the Encyclopaedia Britannica. Molina taught himself mathematics and physics and became the nation’s leading expert on Bayesian and Laplacean probability.
Unlike many others at the time, he realized that “great confusion exists because many authorities have failed to distinguish clearly between the original Bayes inverse theorem and its subsequent generalization by Laplace. The general theorem embraces, or brings together, both the data obtained from a series of observations and whatever ‘collateral’ information exists in relation to the observed results.”11 As Molina explained, applied statisticians were often forced to make quick decisions based on meager observational data; in such cases, they had to rely on indirect prior knowledge, called collateral information. This could range from assessments of national or historic trends to an executive’s mental health. Methods for utilizing both statistical and nonstatistical types of evidence were needed.
Using Laplace’s formula, Molina combined his prior information about the economics of automating Bell telephone systems with data about telephone call traffic, call length, and waiting time. The result was a cost-effective way for Bell to deal with uncertainties in telephone usage.
Molina then worked on automating Bell’s labor-intensive system. In many cities the company employed 8 to 20% of the female population as telephone operators, switching wires to route calls through trunking facilities to customers in distant exchanges. Operators were in short supply, annual turnover in some cities was 100% or more annually, and wages doubled between 1915 and 1920. Depending on one’s point of view, the work epitomized either opportunities for women or the inhumane pressure of modern technology.
To automate the system Molina conceived of the rel
ay translator, which converted decimally dialed phone numbers into routing instructions. Then he used Bayes to analyze technical information and the economics of various combinations of switches, selectors, and trunking lines at particular exchanges. After women won the right to vote in 1920, Bell feared a backlash if it fired all its operators, so it chose an automating method that merely halved their numbers. Between the world wars, employment of operators dropped from 15 to 7 per 1,000 telephones even as toll call service increased. Probability assumed an important role in the Bell System, and Bayesian methods were used to develop basic sampling theory.
Molina won prestigious awards, but his use of Bayes remained controversial among some Bell mathematicians, and he complained he had trouble publishing his research. Some of his problems may have stemmed from his colorful character. He loved model boats, published articles about Edgar Allan Poe’s use of probability, played the piano expertly, and donated to the Metropolitan Opera in New York. He followed the Russo-Japanese war so avidly that his colleagues nicknamed him, not fondly, General Molina. When he independently discovered the Poisson distribution, he called it the Molina distribution until he learned to his embarrassment that Laplace’s protégé Siméon Denis Poisson had written about it in the 1830s.
Molina’s enthusiasm for Bayes-Laplacean probability did not spread to other American corporations. AT&T often regarded his articles about Bayes as proprietary secrets and published them only in in-house publications years after the fact.
While Bayes’ rule was helping to save the Bell System, financiers were rushing to build American railroads and industries. Government safety regulations were nonexistent, however, and 1 out of every 318 industrial workers was killed on the job between 1890 and 1910, and many more were injured. The country’s labor force suffered more accidents, sickness, invalidity, premature old age, and unemployment than European workers. Yet, unlike most of Europe, the United States had no system for insuring sick and injured workers, and most blue-collar families lived one paycheck away from needing charity. Federal judges ruled that injured employees could sue only if their bosses were personally at fault. In 1898 a U.S. Department of Labor statistician could think of no other social or legal reform in which the United States lagged so far behind other nations.