The Scientific Attitude

Home > Other > The Scientific Attitude > Page 18
The Scientific Attitude Page 18

by Lee McIntyre


  Like the p-hacking “crisis,” one might see this as a serious challenge to the claim that science is special, and to the claim that scientists are distinctive in their commitment to a set of values that honor basing one’s beliefs on empirical evidence. But again, much can be learned by examining how the scientific community has responded to this replication “crisis.”

  Let’s start with a closer look at the crisis itself. In August 2015, Brian Nosek and colleagues published a paper entitled “Estimating the Reproducibility of Psychological Science,” in the prestigious journal Science. It dropped like a bombshell. A few years earlier Nosek had cofounded the Center for Open Science (COS), which had as its mission to increase data sharing and replication of scientific experiments. Their first project was entitled the Reproducibility Project. Nosek recruited 270 other researchers to help him attempt to reproduce one hundred psychological studies. And what they found was shocking. Out of one hundred studies, only 36 percent percent of the findings could be reproduced.80

  It is important to point out here that Nosek et al. were not claiming that the irreproducible studies were fraudulent. In every case they found an effect in the same direction that the original researchers had indicated. It’s just that in most of them the effect size was cut in half. The evidence was not as strong as the original researchers had said it was, so the conclusion of the original papers was unsupported in over half of the cases. Why did this happen? One’s imagination immediately turns to all of the “degrees of freedom” that were discussed earlier in this chapter. From too small a sample size to analytical error, the replicators found a number of different shortcomings.

  This led to headline announcements throughout the world heralding a replication crisis in science. And make no mistake, the fallout was not just for psychology. John Ioannidis (whom we remember for his earlier paper “Most Research Results Are Wrong”) said that the problem could be even worse in other fields, such as cell biology, economics, neuroscience, or clinical medicine.81 “Many of the biases found in psychology are pervasive,” he said.82 Others agreed.83 There were, of course, some skeptics. Norbert Schwarz, a psychologist from the University of Southern California, said “there’s no doubt replication is important, but it’s often just an attack, a vigilante exercise.” Although he was not involved in any of the original studies, he went on to complain that replication studies themselves are virtually never evaluated for errors in their own design or analysis.84

  Yet this is exactly what happened next. In March 2016, three researchers from Harvard University (Daniel Gilbert, Gary King, and Stephen Pettigrew) and one of Nosek’s colleagues from the University of Virginia (Timothy Wilson), released their own analysis of the replication studies put forward by the Center for Open Science and found that “the research methods used to reproduce those studies were poorly designed, inappropriately applied and introduced statistical error into the data.”85 When a re-analysis of the original studies was done using better technique, the reproducibility rate was close to 100 percent. As Wilson put it, “[the Nosek study] got so much press, and the wrong conclusions were drawn from it. … It’s a mistake to make generalizations from something that was done poorly, and this we think was done poorly.”86

  This is a breathtaking moment for those who care about science. And, I would argue, this is one of the best demonstrations of the critical attitude behind science that one could imagine. A study was released that found flawed methodology in some studies, and—less than seven months later—it was revealed that there were flaws in the study that found the flaws. Instead of a “keystone cops” conclusion, it is better to say that in science we have a situation where the gatekeepers also have gatekeepers—where there is a system of checking and rechecking in the interest of correcting errors and getting closer to the truth.

  What were the errors in Nosek’s studies? They were numerous.

  (1)  Nosek et al took a nonrandom approach in selecting which studies to try to replicate. As Gilbert explains, “what they did is create an idiosyncratic, arbitrary list of sampling rules that excluded the majority of psychology subfields from the sample, that excluded entire classes of studies whose methods are probably among the best in science from the sample. … Then they proceeded to violate all of their own rules. … So the first thing we realized was that no matter what they found—good news or bad news—they never had any chance of estimating the reproducibility of psychological science, which is what the very title of their paper claims they did.”87

  (2)  Some of their studies were not even close to exact replications. In one of the most egregious examples, the COS team were trying to recreate a study on attitudes toward affirmative action that had been done at Stanford. The original study had asked white students at Stanford to watch a video of four other Stanford students—three white and one black—talking about race, during which one of the white students made an offensive comment. It was found that the observers looked significantly longer at the black student when they believed that he could hear the comment than when they believed that he could not. In their attempt at replication, however, the COS team showed the video to students at … the University of Amsterdam. Perhaps it is not shocking that Dutch students, watching American students speaking in English, would not have the same reaction. But here is where the real trouble came, for when the COS team realized the problem and reran the experiment at a different American university and got the same result as the original Stanford study, they chose to leave this successful replication out of their report, but included the one from Amsterdam that had failed.88

  (3)  There was no agreed-upon protocol for quantitative analysis before the replication. Instead, Nosek and his team used five different measures (including strength of the effect) to look at all of the results combined. It would have been better, Gilbert et al. recommended, to focus on one measure.89

  (4)  The COS team failed to account for how many of the replications would have been expected to fail purely by chance. It is indefensible statistical method to expect a 100 percent success rate in replications. King explains, “If you are going to replicate 100 studies, some will fail by chance alone. That’s basic sampling theory. So you have to use statistics to estimate how many studies are expected to fail by chance alone because otherwise the number that actually do fail is meaningless.”90

  The result of all these errors is that—if you take them into account and undo them—the reproducibility rate for the original one hundred studies was “about what we should expect if every single one of the original findings had been true.”91 It should be noted here that even so, Gilbert, King, Pettigrew, and Wilson are not suggesting fraud or any other type of malfeasance on the part of Nosek and his colleagues. “Let’s be clear,” Gilbert said, “No one involved in this study was trying to deceive anyone. They just made mistakes, as scientists sometimes do.”92

  Nosek nonetheless complained that the critique of his replication study was highly biased: “They are making assumptions based on selectively interpreting data and ignoring data that’s antagonistic to their point of view.”93 In short, he accused them almost precisely of what they had accused him of doing—cherry picking—which is what he had accused some of the original researchers of doing too.

  Who is right here? Perhaps they all are. Uri Simonsohn (one of the coauthors of the previously discussed p-hacking study by Simmons et al.)—who had absolutely nothing to do with this back-and-forth on the replication crisis and its aftermath—said that both the original replication paper and the critique used statistical techniques that are “predictably imperfect.” One way to think about this, Simonsohn said, is that Nosek’s paper said the glass was 40 percent full, whereas Gilbert et al. said it could be 100 percent full.94 Simonsohn explains, “State-of-the-art techniques designed to evaluate replications say it is 40 percent full, 30 percent empty, and the remaining 30 percent could be full or empty, we can’t tell till we get more data.”95

  What are the implications for the
scientific attitude? They are manifest in this last statement. When we don’t have an answer to an empirical question, we must investigate further. Researchers used the scientific attitude to critique the study that critiqued other studies. But even here, we have to check the work. It is up to the scientific community as a whole to decide—just as they did in the case of cold fusion—which answer is ultimately right.

  Even if all one hundred of the original studies that were examined by the Reproducibility Project turn out to be right, there needs to be more scrutiny on the issues of data sharing and replication in science. This dust up—which is surely far from over—indicates the need to be much more transparent in our research and reporting methods, and in trying to replicate one another’s work. (Indeed, look how valuable the reproducibililty standard was in the cold fusion dispute.) And already perhaps some good has come of it. In 2015, the journal Psychological Science announced that it will now ask researchers to preregister their study methods and modes of analysis prior to data collection, so that it can later be reconciled with what they actually found. Other journals have followed suit.96

  The increased scrutiny that the “reproducibility crisis” has created is a good thing for science. Although it is surely embarrassing for some of its participants, the expectation that nothing should be swept under the rug marks yet another victory for the scientific attitude. Indeed, what other field can we imagine rooting out its own errors with such diligence? Even in the face of public criticism, science remains completely committed to the highest standards of evidence. While there may be individual transgressions, the “replication crisis” in science has made clear its commitment to the scientific attitude.

  Conclusion

  In this chapter, I’ve come down pretty hard on some of the mistakes of science. My point, however, was not to show that science is flawed (or that it is perfect), even if I did along the way show that scientists are human. We love our own theories. We want them to be right. We surely make errors of judgment all the time in propping up what we hope to be true, even if we are statisticians. My point has not been to show that scientists do not suffer from some of the same cognitive biases, like confirmation bias and motivated reasoning, as the rest of us. Arguably, these biases may even be the reason behind p-hacking, refusal to share one’s data, taking liberties with degrees of freedom, running out to a press conference rather than subjecting oneself to peer review, and offering work that is irreproducible (even when our subject of inquiry is irreproducible work itself). Where the reader might have been tempted in various cases to conclude that such egregious errors could only be the result of intentional oversight—and must therefore reveal fraud—I ask you to step back and consider the possibility that they may be due merely to unconscious cognitive bias.97

  Fortunately, science as an institution is more objective than its practitioners. The rigorous methods of scientific oversight are a check against individual bias. In several examples in this chapter we saw problems that were created by individual ambition and mental foibles that could be rectified through the rigorous application of community scrutiny. It is of course hoped that the scientific attitude would be embraced by all of its individual practitioners. But as we’ve seen, the scientific attitude is more than just an individual mindset; it is a shared ethos that is embraced by the community of scholars who are tasked with judging one another’s theories against publicly available standards. Indeed, this may be the real distinction between science and pseudoscience. It is not that pseudoscientists suffer from more cognitive bias than scientists. It is not even that scientists are more rational (though I hope this is true). It is instead that science has made a community-wide effort to create a set of evidential standards that can be used as a check against our worst instincts and make corrections as we go, so that scientific theories are warranted for belief by those beyond the individual scientists who discovered them. Science is the best way of finding and correcting human error on empirical matters not because of the unusual honesty of scientists, or even their individual commitment to the scientific attitude, but because the mechanisms for doing so (rigorous quantitative method, scrutiny by one’s peers, reliance on the refutatory power of evidence) are backed up by the scientific attitude at the community level.

  It is the scientific attitude—not the scientific method—that matters. As such, we need no longer pretend that science is completely “objective” and that “values” are irrelevant. In fact, values are crucial to what is special about science. Kevin deLaplante says, “Science is still very much a ‘value-laden’ enterprise. Good science is just as value-laden as bad science. What distinguishes good from bad science isn’t the absence of value judgments, it’s the kind of value judgments that are in play.”98

  There are still a few problems to consider. First, what should we do about those who think that they have the scientific attitude when they don’t? This concern will be dealt with in chapter 8, where we will discuss the problems of pseudoscience and science denialism. I trust, however, that one can already begin to imagine the outlines of an answer: it is not just an individual’s reliance on the scientific attitude—and surely not his or her possibly mistaken belief about fulfilling it—that keeps science honest. The community will judge what is and is not science. Yet now a second problem arises, for is it really so easy to say that the “community decides”? What happens when the scientific community gets it wrong? In chapter 8, I will also deal with this problem, via a consideration of those instances where the individual scientist—like Galileo—was right and the community was wrong.99 As an example I will discuss the delightful, but less well-known, example of Harlen Bretz’s theory that a megaflood was responsible for creating the eastern Washington State scablands. I will also offer some reasons for disputing the belief that today’s climate change deniers may just be tomorrow’s Galileo. And, as promised, I will examine the issue of scientific fraud in chapter 7.

  Before all that, however, in the next chapter, we will take a break from consideration of scientific drawbacks and failures to take a closer look at one example of an unabashed success of the scientific attitude and how it happened: modern medicine.

  Notes

  1. James Ladyman, “Toward a Demarcation of Science from Pseudoscience,” in The Philosophy of Pseudoscience: Reconsidering the Demarcation Problem, ed. Massimo Pigliucci and Maarten Boudry (Chicago: University of Chicago Press, 2013), 56.

  2. See Noretta Koertge, “Belief Buddies versus Critical Communities,” in The Philosophy of Pseudoscience: Reconsidering the Demarcation Problem, ed. Massimo Pigliucci and Maarten Boudry, 165–180 (Chicago: University of Chicago Press, 2013).

  3. When it does, there is even a mechanism by which it can be retracted. See chapter 5 for more discussion of the issue of retraction in science.

  4. See chapter 5 for more on p-hacking.

  5. Robert Trivers, The Folly of Fools: The Logic of Deceit and Self-Deception in Human Life (New York: Basic Books, 2011).

  6. Robert Trivers, “Fraud, Disclosure, and Degrees of Freedom in Science,” Psychology Today (blog entry: May 10, 2012), https://www.psychologytoday.com/blog/the-folly-fools/201205/fraud-disclosure-and-degrees-freedom-in-science.

  7. J. Wicherts et al., “Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results,” PLOS ONE 6, no. 11 (Nov. 2011): e26828, http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026828.

  8. J. Simmons et al., “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” Psychological Science 22 (2011): 1359–1366, http://journals.sagepub.com/doi/pdf/10.1177/0956797611417632.

  9. Simmons et al., “False-Positive Psychology,” 1359.

  10. Simmons et al., “False-Positive Psychology,” 1360.

  11. Daniel Kahneman, Thinking Fast and Slow (New York: Farrar, Straus and Giroux, 2013). Another modern classic in the field of behavioral economics is
Richard Thaler, Nudge: Improving Decisions about Health, Wealth, and Happiness (New York: Penguin, 2009).

  12. One good example of this is the cold fusion debacle, which we will explore in depth later in this chapter.

  13. P. Wason, “Reasoning about a Rule,” Quarterly Journal of Experimental Psychology 20, no. 3 (1968): 273–281.

  14. Lee McIntyre, Respecting Truth: Willful Ignorance in the Internet Age (New York: Routledge, 2015), 15–16.

  15. Cass Sunstein, Infotopia: How Many Minds Produce Knowledge (Oxford: Oxford University Press, 2008).

  16. For a full discussion, see McIntyre, Respecting Truth, 117–118.

  17. See the discussion in Sunstein, Infotopia, 207; McIntyre, Respecting Truth, 119.

  18. Tom Settle, “The Rationality of Science versus the Rationality of Magic,” Philosophy of the Social Sciences 1 (1971): 173–194, http://journals.sagepub.com/doi/pdf/10.1177/004839317100100201.

  19. Settle, “The Rationality of Science,” 174.

  20. Sven Hansson, “Science and Pseudo-Science,” The Stanford Encyclopedia of Philosophy.

  21. Settle, “The Rationality of Science,” 183.

 

‹ Prev