by Lee McIntyre
22. Koertge, “Belief Buddies,” 177–179.
23. Helen Longino, Science as Social Knowledge: Values and Objectivity in Scientific Inquiry (Princeton: Princeton University Press, 1990).
24. Longino, Science as Social Knowledge, 66–67.
25. Longino, Science as Social Knowledge, 69, 74.
26. Longino, Science as Social Knowledge, 216. For a fascinating contrast, one might consider Miriam Solomon’s Social Empiricism (Cambridge, MA: MIT Press, 2001), in which she embraces the idea that the actions of the “aggregate community of scientists” trump how “individual scientists reason” (135), but disagrees with Longino on the question of whether such social interactions can “correct” individual biases in reasoning (139). While Solomon finds much to praise in Longino’s point of view, she criticizes her for not embedding her “ideal” claims in more actual scientific cases. Indeed, Solomon maintains that when one looks at scientific cases it becomes clear that biases—even cognitive biases—play a positive role in facilitating scientific work. Far from thinking that science should eschew bias, Solomon makes the incredible claim that “scientists typically achieve their goals through the aid of ‘biased’ reasoning” (139).
27. Some of these errors, of course, are the result of cognitive biases that all human beings share. The point here is not that scientists do not have these biases, but that science writ large is committed to reducing them through group scrutiny of individual ideas. For more on this, see Kevin deLaplante’s Critical Thinker Academy podcast: https://www.youtube.com/watch?v=hZkkY2XVzdw&index=5&list=PLCD69C3C29B645CBC.
28. As noted, in chapter 7 we will discuss at length the issue of scientific misconduct that amounts to outright deception, such as data fabrication and manipulation.
29. G. King, R. Keohane, and S. Verba, Designing Social Inquiry: Scientific Inference in Qualitative Research (Princeton: Princeton University Press, 1994). Where possible, it is helpful to use statistical evidence in science. Where this is not possible, however, this is no excuse for being less than rigorous in one’s methodology.
30. As expected, significance effects in published research tend to cluster at 5 percent.
31. For a comprehensive and rigorous look at a number of foundational issues in statistics and its relationship to the philosophy of science, one could do no better than Deborah Mayo’s classic Error and the Growth of Experimental Knowledge (Chicago: University of Chicago Press, 1996). Here Mayo offers not only a rigorous philosophical examination of what it means to learn from evidence, but also details her own “error-statistical” approach as an alternative to the popular Bayesian method. The essential question for scientific reasoning is not just that one learns from empirical evidence but how. Mayo’s championing of the role of experiment, severe testing, and the search for error is essential reading for those who wish to learn more about how to defend the claims of science as they pertain to statistical reasoning.
32. It is important to remember, though, that there is still a nonzero chance that even the most highly correlated events are causally unrelated.
33. The term p-hacking was coined by Simmons et al. in their paper “False Positive Psychology.”
34. M. Head, “The Extent and Consequences of P-Hacking in Science,” PLOS Biology 13, no. 3 (2015): e1002106, http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106.
35. One used to have to compute t-tests and f-tests by hand, then look up the corresponding p-value in a table.
36. Simmons et al., “False Positive Psychology,” 1359.
37. Simmons et al., “False Positive Psychology,” 1359.
38. Steven Novella, “Publishing False Positives,” Neurologica (blog), Jan. 5, 2012, http://theness.com/neurologicablog/index.php/publishing-false-positives/.
39. Christie Aschwanden, “Science Isn’t Broken: It’s Just a Hell of a Lot Harder Than We Give it Credit For,” FiveThirtyEight, Aug. 19, 2015, https://fivethirtyeight.com/features/science-isnt-broken/.
40. J. Ioannidis, “Why Most Published Research Findings Are False,” PLOS Medicine 2, no. 8 (2005): e124, http://robotics.cs.tamu.edu/RSS2015NegativeResults/pmed.0020124.pdf.
41. See Ronald Giere, Understanding Scientific Reasoning (New York: Holt, Rinehart, and Winston, 1984), 153.
42. R. Nuzzo, “Scientific Method: Statistical Errors,” Nature 506 (2014): 150–152.
43. Head, “Extent and Consequences of P-Hacking.”
44. Nuzzo, “Scientific Method.”
45. Nuzzo, “Scientific Method.”
46. “Scientists’ overreliance on p-values has led at least our journal to decide it has had enough of them. In February [2015], Basic and Applied Psychology announced that it will no longer publish p-values. ‘We believe that the p [is less than] .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research,’ the editors wrote in their announcement. Instead of p-values, the journal will require ‘strong descriptive statistics, including effect sizes.’ ” Aschwanden, “Science Isn’t Broken.”
47. Head, “Extent and Consequences.” One should note, though, that Head’s findings have been disputed by some who claim that they are an artifact of rounding at the second decimal. This isn’t to say that p-hacking doesn’t occur, just that the bump that Head purports to find at 0.05 in her p-curves is not good evidence that it is quite so widespread. C. Hartgerink, “Reanalyzing Head et al. (2015): No Widespread P-Hacking After All?” Authorea, Sept. 12, 2016, https://www.authorea.com/users/2013/articles/31568/_show_article.
48. S. Novella, “P-Hacking and Other Statistical Sins,” Neurologica (blog), Feb. 13, 2014, http://theness.com/neurologicablog/index.php/p-hacking-and-other-statistical-sins/.
49. Or, in some cases, how it is measured. If one were a Bayesian, for instance, and thought that it was important to include an assessment of the prior probability of a hypothesis, how can this subjective feature be measured?
50. Head, “Extent and Consequences.”
51. Simmons et al., “False Positive Psychology,” 1362–1363.
52. Simmons et al., “False Positive Psychology,” 1365.
53. Novella, “P-Hacking.”
54. And, even if it does not, the “bots” are coming. In a recent paper entitled “The Prevalence of Statistical Reporting Errors in Psychology (1985–2013),” M. Nuijten et al. announce the results of a newly created software package called “statcheck” that can be used to check for errors in any APA-style paper. So far, 50,000 papers in psychology have been audited with half showing some sort of mathematical error (usually tending in the author’s favor). These results are published on the PubPeer website (which some have called “methodological terrorism”), but, as the authors point out, the goal is to deter future errors, so there is now a web application that allows authors to check for errors in their own work before their papers are submitted. This too would be a cultural change in science, enabled by better technology. Behavior Research Methods 48, no. 4 (2016): 1205–1226, https://mbnuijten.files.wordpress.com/2013/01/nuijtenetal_2015_reportingerrorspsychology1.pdf. See also Brian Resnick, “A Bot Crawled Thousands of Studies Looking for Simple Math Errors: The Results Are Concerning,” Vox, Sep. 30, 2016, http://www.vox.com/science-and-health/2016/9/30/13077658/statcheck-psychology-replication.
55. Indeed, in some cases peer review is subject to double or even triple-blind reviewing, where in addition to the author not knowing who the reviewer is, the reviewer doesn’t know who the author is, and the editor may not know either.
56. John Huizenga, Cold Fusion: The Scientific Fiasco of the Century (Rochester: University of Rochester Press, 1992), 235.
57. Huizenga, Cold Fusion, 215–236.
58. Huizenga, Cold Fusion, 218.
59. Huizenga, Cold Fusion, 218.
60. Huizenga, Cold Fusion, 57.
61. Gary Taubes, Bad Science: The Short Life and Weird Times of Cold Fusion (New York: Random House, 1993).
62. Details of each of these “confirmations” c
an be found in Taubes, Bad Science, where he explores the problems with each: the “Boron” paper was retracted (owing to a discovery that their neutron detector was heat sensitive); the “theoretical” paper about helium was suspect because it had been written expressly to fit the cold fusion findings; and the “excess heat” results in heavy water were flawed by the fact that they also found a reaction in light water, which could be explained by a chemical reaction.
63. Solomon Asch, “Opinions and Social Pressure,” Scientific American 193, no. 5 (Nov. 1955): 31–35, https://www.panarchy.org/asch/social.pressure.1955.html.
64. Taubes, Bad Science, 162.
65. Taubes, Bad Science, 162.
66. Wicherts et al., “Willingness to Share Research Data.”
67. Fleischmann is guilty of this as well, but it was Pons who repeatedly refused to share data.
68. John Maddox, quoted in Taubes, Bad Science, 240.
69. Taubes, Bad Science, 191.
70. Taubes, Bad Science, 191.
71. Taubes, Bad Science, 197.
72. Huizenga, Cold Fusion, 234.
73. The fact that there are some people who commit murder does not refute the fact that we live in a society that has laws against murder and attempts to enforce them.
74. Even here, however, this does not necessarily prove bad motive. Perhaps the people who are more scrupulous about keeping their data in good shape so that it can be shared are also those who are most likely to be meticulous in their method.
75. Stuart Firestein, Failure: Why Science Is So Successful (Oxford: Oxford University Press, 2015).
76. Firestein, Failure.
77. It should go without saying that just as there is a distinction between irreproducibility and fraud, there should be one between retraction and fraud. One should not assume that just because a study has been retracted it is fraudulent. There can be many reasons for retraction, as we shall see.
78. This, however, may be changing—at least in parts of the social sciences—as some new researchers seem eager to make a name for themselves by challenging the work of their more-established colleagues. See Susan Dominus, “When the Revolution Came for Amy Cuddy,” New York Times, Oct. 18, 2017, https://www.nytimes.com/2017/10/18/magazine/when-the-revolution-came-for-amy-cuddy.html.
79. Ian Sample, “Study Delivers Bleak Verdict on Validity of Psychology Experiment Results,” Guardian, Aug. 27, 2015, https://www.theguardian.com/science/2015/aug/27/study-delivers-bleak-verdict-on-validity-of-psychology-experiment-results.
80. B. Nosek et al., “Estimating the Reproducibility of Psychological Science,” Science 349, no. 6251 (Aug. 2015), http://psych.hanover.edu/classes/Cognition/Papers/Science-2015--.pdf. Note that of the 100 studies evaluated, 3 were excluded for statistical reasons, 62 were found to be irreproducible, and only 35 were found to be reproducible.
81. B. Carey, “Many Psychology Findings Not as Strong as Claimed, Study Says,” New York Times, Aug. 27, 2015.
82. B. Carey, “Psychologists Welcome Analysis Casting Doubt on Their Work,” New York Times, Aug. 28, 2015.
83. Joel Achenbach, “No, Science’s Reproducibility Problem Is Not Limited to Psychology,” Washington Post, Aug. 28, 2015.
84. Carey, “Many Psychology Findings.”
85. A. Nutt, “Errors Riddled 2015 Study Showing Replication Crisis in Psychology Research, Scientists Say,” Washington Post, March 3, 2016.
86. B. Carey, “New Critique Sees Flaws in Landmark Analysis of Psychology Studies,” New York Times, March 3, 2016.
87. Nutt, “Errors Riddled 2015 Study.”
88. P. Reuell, “Study That Undercut Psych Research Got It Wrong: Widely Reported Analysis That Said Much Research Couldn’t Be Reproduced Is Riddled with Its Own Replication Errors, Researchers Say,” Harvard Gazette, March 3, 2016.
89. Carey, “New Critique Sees Flaws.”
90. Reuell, “Study That Undercut Psych Research.”
91. Nutt, “Errors Riddled 2015 Study.”
92. Reuell, “Study That Undercut Psych Research.”
93. Carey, “New Critique Sees Flaws.”
94. Carey, “New Critique Sees Flaws.”
95. Simonsohn, quoted in Carey, “New Critique Sees Flaws.”
96. See http://www.psychologicalscience.org/publications/psychological_science/preregistration.
97. With p-hacking, for instance, can’t we imagine a scientist’s decision about whether to keep a study open to look for more data as evidence of confirmation bias? Unconsciously, they are rooting for their own theory. Would it necessarily feel wrong to want to gather more data to see if it really worked? Here it seems wise to make note of “Hanlon’s razor,” which tells us that we should be reluctant to attribute to malice that which can be adequately explained by incompetence.
98. Kevin deLaplante, The Critical Thinker (podcast), “Cognitive Biases and the Authority of Science,” https://www.youtube.com/watch?v=hZkkY2XVzdw&index=5&list=PLCD69C3C29B645CBC.
99. The task is formidable here, for there are two possible errors: when one waits too long to embrace the truth (as with the reaction to Semmelweis) and when one jumps too soon to a conclusion that outstrips the evidence (as with cold fusion).
6 How the Scientific Attitude Transformed Modern Medicine
It is easy to appreciate the difference that the scientific attitude can make in transforming a previously undisciplined field into one of scientific rigor, for we have the example of modern medicine. Prior to the twentieth century, the practice of medicine was based largely on hunches, folk wisdom, and trial and error. Large-scale experiments were unknown and data were difficult to gather. Indeed, even the idea that one needed to test one’s hypotheses against empirical evidence was rare. All of this changed within a relatively short period of time after the germ theory of disease in the 1860s and its translation into clinical practice in the early twentieth century.1
We already saw in chapter 1 how Ignaz Semmelweis’s discovery of the cause of childbed fever in 1846 provides a prime example of what it means to have the scientific attitude. We also saw that he was far ahead of his time and that his ideas were met with unreasoned opposition. The scientific attitude that Semmelweis embraced, however, eventually found fruition throughout medicine. At about the same time as Semmelweis’s work, medicine saw the first public demonstration of anesthesia. For the first time, surgeons could take their time doing operations, as they no longer had to wrestle down fully awake patients who were screaming in pain. This did not in and of itself allow for lower mortality, as one complicating factor from lengthened surgeries was that patients also had more time for their wounds to be open to the air and get infected.2 Only after Pasteur discovered bacteria and Koch detailed the process of sterilization did the germ theory of disease begin to take root. When Lister introduced antiseptic techniques (which kill the germs) and aseptic surgery (which prevents the germs from entering in the first place) in 1867, it was finally possible to keep the cure from sometimes being worse than the disease.3
From today’s point of view, it is easy to take these advances for granted and underappreciate how they led to the growth of better quantitative techniques, laboratory analysis, controlled experimentation, and the idea that diagnosis and treatment should be based on evidence rather than intuition. But one should not forget that Western medicine has always fancied itself to be scientific; it is just that the meaning of the term has changed.4 Astrological medicine and bloodletting were once considered cutting edge, based on the highest principles of rationality and experience. One would be hard pressed to find any eighteenth-century doctor—or I imagine even one in the early Greek era—who did not consider his knowledge “scientific.”5 How can one claim, then, that these early physicians and practitioners were so woefully benighted? As we have seen, such things are judged by the members who make up one’s profession and so are relative to the standards of the age—but according to the standards of the early nineteenth-century, bloodletting seemed just fine.
>
My goal in this chapter is not to disparage the beliefs of any particular period, even as they cycled through untested beliefs to outrageous hypotheses, sometimes killing their patients with what we would today recognize as criminal incompetence. Instead, I would like to shine a light on how medicine found its way out of these dark ages and moved toward a time when its practice could be based on careful observation, calculation, experiment, and the flexibility of mind to accept an idea when (and only when) it had been empirically demonstrated. Remember that the scientific attitude requires not just that we care about evidence (for what counts as evidence can change from age to age) but that we are willing to change our theory based on new evidence. It is this last part that is crucial when some area of inquiry is trying to make the jump from pseudoscience to science, from mere opinion to warranted belief. For it was only when this mindset was finally embraced—when physicians stopped thinking that they already had all the answers based on the authority of tradition and began to realize that they could learn from experiment and from the experience of others—that medicine was able to come forward as a science.
The Barbarous Past
In his excellent book The History of Medicine: A Very Short Introduction, William Bynum reminds us that—despite bloodletting, toxic potions, skull drilling, and a host of “cures” that were all too often worse than the disease—Western medicine has always considered itself modern.6
One of the greatest systems of medicine in antiquity came from Hippocrates, whose chief insight (besides the Hippocratic Oath for which he is famous) involved his theory of how the “four humors” could be applied to medicine. As Roy Porter puts it in his masterful work The Greatest Benefit to Mankind:
From Hippocrates in the fifth century BC through to Galen in the second century AD, “humoral medicine” stressed the analogies between the four elements of external nature (fire, water, air and earth) and the four humours or bodily fluids (blood, phlegm, choler or yellow bile and black bile), whose balance determined health.7