The Theory That Would Not Die

Page 33

by Sharon Bertsch McGrayne

In our struggle to survive in an uncertain and changing world, our sensory and motor systems often produce signals that are incomplete, ambiguous, variable, or corrupted by random fluctuations. If we put one hand under a table and estimate its location, we can be off by up to 10 centimeters. Every time the brain generates a command for action, we produce a slightly different movement. In this confusing world, Bayes has emerged as a useful theoretical framework. It helps explain how the brain may learn. And it demonstrates mathematically how we combine two kinds of information: our prior beliefs about the world with the error-fraught evidence from our senses.

As Lindley emphasized years ago, if we are certain about the evidence relayed by our senses, we rely on them. But when faced with unreliable sensory data, we fall back on our prior accumulation of beliefs about the world.

When Daniel Wolpert of Cambridge University tested the theory with a virtual tennis game, he showed that players unconsciously combine their prior knowledge of bouncing balls in general with sensory data about a particular ball coming over the net. In short, they unconsciously behave like good Bayesians. In addition, Wolpert said, the nice thing about Bayes was that it did not produce a single number. It made multiple predictions about every possible state given the sensory data. Thus, the tennis ball would most probably bounce in a particular spot—but there was also a reasonable chance it would fall elsewhere.

According to Bayes, the brain stores a wide range of possibilities but assigns them high and low probabilities. Color vision is already known to operate this way. We think we perceive red, but we actually see an entire spectrum of colors, assign the highest probability to red, and keep in mind the outside possibilities that the color could be pink or purple.

Wolpert concluded that Bayesian thinking is basic to everything a human does, from speaking to acting. The biological brain has evolved to minimize the world’s uncertainties by thinking in a Bayesian way. In short, growing evidence suggests that we have Bayesian brains.

Given Bayes’ contentious past and prolific contributions, what will the future look like? The approach has already proved its worth by advancing science and technology from high finance to e-commerce, from sociology to machine learning, and from astronomy to neurophysiology. It is the fundamental expression for how we think and view our world. Its mathematical simplicity and elegance continue to capture the imagination of its users.

But what about the years to come? Brute computer force can organize stunning quantities of information, but it clusters and searches for documents crudely, according to keywords. Only the brain examines documents and images according to their meaning and their content. Which approach will be more useful? Will computers become so powerful that huge amounts of data alone will teach us everything? Will scientists no longer need to theorize or hypothesize before experimenting or gathering their data? Or will Bayesian organizational principles remain fundamental? Current strategies for designing computers that could perform at biological levels exploit such ancient principles as reusable parts, hierarchical structures, variations on themes, and regulatory systems.

The jumping off point for this debate is Bayes and its priors, says Stuart Geman, whose Gibbs sampler helped launch the modern Bayesian revolution: “In this debate, there is no more powerful argument for Bayes than its recognition of the brain’s inner structures and prior expectations.” The old controversies between Bayesians and frequentists have been reframed in terms of, Do we use probabilities or not? Old or new, the issues are similar, if not identical, Geman says. And in its new guise, Bayesian learning and its priors occupy the heart of the debate.

Can we look forward to a time when computers can compete with our biological brains for understanding? Will they be programmed with Bayes? Or with something else?

Whatever the outcome of the revolution, Diaconis insists that Bayes will play a role. “Bayes is still young. Probability did not have any mathematics in it until 1700. Bayes grew up in data-poor and computationally poor circumstances. It hasn’t settled down yet. We can give it time.

“We’re just starting.”

appendix a

dr fisher’s casebook: the doctor sees the light

by michael j. campbell

As one gets older one’s thoughts turn naturally to religion and I have been pondering the religious metaphors in statistics. Clearly the frequentists are metaphorical Catholics; dividing results into “significant” and “nonsignificant” instead of dividing sins into “mortal” (i.e. significant) and venial. Randomisation is the grace that saves the world. In confession the priest is interested in the frequency with which one committed a sin (I can imagine passing the priest a bar-chart of how many times I swore, or was uncharitable, rather than giving him a verbal list—so much more informative!) After confession frequentists/Catholics are forgiven and so, having rejected a null hypothesis at p < 0.05, once it is published they are free to use 0.05 as the limit again. The frequentist prayer is “Our Fisher, who art in Heaven”. Their saints are Pearson and Neyman. Instead of Heaven and Hell they have the Null and Alternative hypotheses, and in their Creed instead of “Do you reject Satan?” they have “Do you reject the null hypothesis?”.

On the other hand Bayesians are born-again fundamentalists. One must be a “believer” and Bayesians can often pinpoint the day when Bayes came into their lives, when they dropped these childish frequentist ways (or even “came out”). Clearly the Reverend Thomas Bayes is their spiritual guide and leader, and he even imitated the Christian God by not publishing in his own lifetime (mind you, I have heard non-Bayesians wish that some of his followers had done likewise). Bayesians divide the world into people who believe and those who do not and will ask complete strangers at statistics conferences “are you a Bayesian?” as if it were an important defining characteristic. On finding a non-Bayesian, they will express amazement at the things the nonBayesian does, point out the certainties of their own beliefs and attempt to convert the non-believer.

Then there are the sects. The agnostics are those who think that nonparametric statistics are the answer to everything. Similarly the bootstrappers cannot see why you need to bring God into it at all. There is the “bell-curve” cult, who think everything can be explained by reference to the Normal distribution. The simulators think God is purely a human invention.

Where do I put myself? Well, in typically woolly English fashion, I regard myself as Anglican. I believe in statistics as a way of finding the truth and am happy to adopt whatever means will help me to get there. I can see dangers in extremism in either direction and so try to steer a “middle way”. I still use p-values and confidence intervals but temper them with prior beliefs. I like the idea of “empirical Bayes” where one uses earlier studies to inform one’s priors. I can see the advantages of Bayesian methods for modelling complex systems and attaching uncertainty to parameters and think that in many ways it reflects scientific inference better. However, I prefer simply to call myself a believer, and not to attach labels to these beliefs.

Talking of religion, I am reminded of a strip of cartoons about Bayesians that appeared some time ago. They showed a series of monks. One was looking lost, one was dressed as a soldier, one was holding a guide book and one had his tongue stuck out. They were, respectively, a vague prior, a uniform prior, an informative prior and, of course, an improper prior . . .

appendix b

applying bayes’ rule to mammograms and breast cancer

In 2009 a U.S. government task force on breast cancer screening advised most women in their forties not to have annual mammograms. The public reaction was immediate—and in large part—enraged. Here’s a simple version of the Bayesian calculation that lay at the very heart of the controversy.

A 40-year-old woman without any symptoms or family history of breast cancer has a mammogram as part of a routine checkup. A week later she receives a letter saying her test result was abnormal. She needs additional testing. What is the probability she actually has breast canc
er?

Quite low.

Many beginning statistics students—and many physicians—find this surprising because mammograms, as a screening test, are reasonably accurate. They identify roughly 80% of 40-year-old women who have breast cancer at the time of their exam, and they provide positive test results to only about 10% of women without the disease.

However, breast cancer is relatively rare. And Bayes’ rule takes back-ground disease rates into account as prior knowledge. As a result, Bayes highlights the fact that not everyone who gets a positive test for a disease actually has that disease. It also underscores the fact that the probability of breast cancer is higher in a woman who finds a lump in her breast than in a woman who has a mammogram as part of a routine checkup.

To illustrate:

According to this formula, we need three pieces of information, which will all go on the right-hand side of the equation:

1. The probability of having breast cancer: This is our prior knowledge about the background disease rate of breast cancer among women in their forties at the time they get a mammogram. Ac-cording to Cancer and Jama, this is approximately 4/10 of 1%. Thus out of every 10,000 women in their forties who have mammograms, we can estimate that approximately 40 actually have the disease. The number: 40/10,000.

2. The probability of a breast cancer patient getting a positive mammogram: According to the National Cancer Institute and evidence from mammography, approximately 32 of those 40 women with breast cancer will get a positive test result from the mammogram. The number: 32/10,000.

3. The probability of getting a positive mammogram: The total number of women who get positive results (whether or not cancer is present) include women with cancer and women who are falsely informed that they have the disease. Mammograms give a positive (“abnormal”) result to some women who do not have the disease; they are called false positives. For mammography, this rate is quite high, approximately 10%, according to the New England Journal of Medicine. Thus out of 10,000 women in their forties, 996 will get a letter telling them they have an abnormal test result. To rule out breast cancer, these women will need more mammography, ultrasound, or tissue sampling, perhaps even a biopsy. To this number must be added the 32 breast cancer patients per 10,000 who will get a positive mammogram. The total number: 1028/10,000 or a little more than 10% of the women screened.

Inserting these numbers into the formula, we get the following:

Doing the arithmetic produces 0.03, or 3%. Thus the probability that a woman who tests positive has breast cancer is only 3%. She has 97 chances out of 100 to be disease free.

None of this is static. Each time more research data become available, Bayes’ rule should be recalculated.

As far as Bayes is concerned, universal screening for a disease that affects only 4/10 of 1% of the population may subject many healthy women to needless worry and to additional treatment which in turn can cause its own medical problems. In addition, the money spent on universal screening could potentially be used for other worthwhile projects. Thus Bayes highlights the importance of improving breast cancer screening techniques and reducing the number of false positives. Another fact also points to the need for better mammography: negative test results miss 1 in 5 cancers.

To apply Bayes’ rule to other problems, here is the general equation:

where A is a hypothesis and B is data.

notes

1. Causes in the Air

1. Two errors about Bayes’ death and portrait have been widely disseminated. First, Bayes died on April 7, 1761, according to cemetery records and other contemporaneous documents gathered by Bayes’ biographers, Andrew Dale and David Bellhouse. Bayes was interred on April 15, which is often called the date of his death. The degraded condition of his vault may have contributed to the confusion.

Second, the often-reproduced portrait of Thomas Bayes is almost assuredly of someone else named “T. Bayes.” The sketch first appeared in 1936 in History of Life Insurance in its Formative Years by Terence O’Donnell. However, the picture’s caption on page 335 says it is of “Rev. T. Bayes, Improver of the Columnar Method developed by Barrett,” and Barrett did not develop his method until 1810, a half-century after the death of “our” Rev. Thomas Bayes.

Bellhouse (2004) first noticed that the portrait’s hairstyle is anachronistic. Sharon North, curator of Textiles and Fashion at the Victoria and Albert Museum, London, agrees: “The hairstyle in this portrait looks very 20th century. . . . Clerical dress is always difficult as the robes and bands (collars) change very little over time. However, I would say that the hair of the man . . . is quite wrong for the 1750s. He would have been wearing a wig for a portrait. Clergymen wore a style of the bob wig (which eventually became known as a ‘clerical wig’), a short very bushy wig of horsehair powdered white.”

2. Dale (1999) 15.

3. All of Bayes’ and Price’s quotations come from their essay.

2. The Man Who Did Everything

1. For details of Laplace’s personal life, I rely on Hahn (2004, 2005). All documents about Laplace’s life were thought to have been lost when a descendant’s home was destroyed by fire in 1925, but Hahn painstakingly located many original documents that revealed new facts and corrected previous assumptions about Laplace’s life and work.

2. “A dizzying expansion of curiosity” is Daniel Roche’s original phrase in his classic, France in the Enlightenment.

3. Voltaire 24.

4. Koda and Bolton 21.

5. Stigler (1978) 234–35.

6. Laplace (1774) OC (8) 27; Laplace (1783/1786) OC (11) 37, and Stigler (1986) 359.

7. Laplace (1776) 113. For English translation, see Hahn in Lindberg and Numbers (1986) 268–70.

8. Laplace (1783) OC (10) 301.

9. Laplace in Dale’s translation (1994) 120, in section titled “Historical note on the probability calculus.”

10. Gillispie (1997) 23.

11. Laplace (1782–85) OC (10) 209–340.

12. Laplace (1778–81) OC (9) 429 and (1783/1786) OC (10) 319.

13. Laplace (1778–81) OC (9) 429.

14. “Easy to see . . . obvious:” Laplace (1778/1781) OC (9) 383–485. The student was Jean-Baptiste Biot.

15. Stigler (1986) 135.

16. Gillispie (1997) 81.

17. Laplace (1783/1786) OC (10) 295–338.

18. Hald (1998) 236 and, for a detailed discussion of Laplace’s birth studies, 230–45.

19. Laplace in Philosophical Essay on Probabilities, Dale’s translation 77.

20. Hahn (2004) 104.

21. Sir William Herschel wrote a firsthand account in his diary. See Dreyer vol. I, lxii, and Hahn in Woolf.

22. Laplace from Exposition du Système du Monde in Crosland 90.

23. Glenn Shafer interview.

24. Laplace, Essai Philosophique, translated in Hahn (2005) 189 and in Dale (1995) 124.

3. Many Doubts, Few Defenders

1. Clerke 200–203.

2. Bell ix and 172–82.

3. David 30.

4. Gillispie (1997) 67, 276–77.

5. Pearson (1929) 208.

6. Porter (1986) 36.

7. Mill in Gigerenzer et al. (1989) 33.

8. Quoted by Dale (1998) 261.

9. G. Chrystal in Hald (1998) 275.

10. Le procès Dreyfus vol. 3, 237–31.

11. Molina in Bailey (1950) 95–96.

12. Rubinow (1914) 13.

13. Rubinow (1917) 35.

14. Rubinow (1914–15) 14.

15. Rubinow (1917) 42.

16. Rubinow (1914–15) 14.

17. Anonymous in Pruitt (1964) 151.

18. Whitney (1918) 287.

19. Pruitt 169.

20. Ibid., 170.

21. Ibid.

22. Pearson in MacKenzie (1981) 204.

23. J. L. Coolidge in Hald (1998) 163.

24. Kruskal 1026.

25. Savage (1976) 445–46.

26. Leonard Darwin in MacKenzie (
1981) 19.

27. Fisher in Box (2006) 127.

28. Fisher (1925) 1.

29. Kruskal 1026.

30. Ibid., 1029.

31. Fisher in Kotz and Johnson I 13.

32. Fisher in Gill 122.

33. Fisher (1925) 9–11.

34. Hald (1998) 733.

35. Savage (1976) 446.

36. E. Pearson in Reid 55–56.

37. Perks 286.

38. Fisher in Neyman Supplement 154–57.

39. Tukey, according to Brillinger e-mail.

40. aip.org/history/curie/scandal. Accessed April 18, 2006.

41. De Finetti (1972).

42. Lindley letter to author.

43. Essen-Möller in Kaye (2004).

44. Huzurbazar 19.

45. Lindley (1983) 14.

46. Howie 126.

47. Ibid., 210.

48. Jeffreys (1939) 99.

49. Lindley (1991) 11.

50. Ibid., 391.

51. Jeffreys (1938) 718.

52. Jeffreys (1939) v.

53. Goodman (2005) 284.

54. Howie 165.

55. Box (1978) 441.

56. Lindley (1986a) 43.

57. Jeffreys (1961) 432.

58. Lindley (1983) 8.

59. Lindley (1991) 10.

4. Bayes Goes to War

1. Churchill 598.

2. Peter Twinn in Copeland (2006) 567.

3. Atkinson and Feinberg 36.

4. D. G. Kendall in ibid., 48.

‹ Prev Next ›