But Planet X was a slippery thing.12 By the time Pluto was discovered (the presumed heir to Planet X, though Planet X is an anomaly distinct from Pluto), the mass of Planet X had been calculated no fewer than four times. As the estimating continued, based on aberrations in Neptune’s orbit, it kept getting smaller.
The first estimate was in 1901, and it showed that Planet X should be nine times the size of Earth. By 1909, it was down to only five times the size of the Earth. By 1919, Planet X was expected to be only twice the Earth’s mass.
Of course, we now know that Pluto, as mentioned earlier, is small compared to these estimates. While it’s not likely to evaporate anytime soon, whatever Dessler and Russell might say, Pluto is far smaller than the expected Planet X.
So, even after Pluto’s discovery, astronomers continued to examine the unexplained properties of Neptune’s and Uranus’s orbits, each time recognizing that Planet X didn’t have to be as large as previously thought. Now the consensus seems to be that Planet X does not exist. Due to Voyager missions we now have a much better handle on Neptune’s mass, and between that and other increasingly precise measurements, it seems that Planet X is not necessary to account for what was previously measured.
Unlike Pluto, which won’t actually vanish, it seems that Planet X already has.
Such a story in the physical sciences—where certain effects and unexplained phenomena decrease in size over time—is rare. However, when it comes to the realm of biology, social sciences, or even medicine, where measurements are not always as clear, and the results are often much more noisy (due to messy issues such as human actions), this problem is much more common. It’s known as the decline effect. In some situations, repeated examination of an effect or a phenomenon yields results that decrease in magnitude over time. In addition to facts themselves having a half-life, the decline effect states that facts can sometimes decay in their impact or their magnitude.
While some have made this out to be somewhat mysterious, that needn’t always be the case, as shown in the example of Planet X. Increasingly precise measurement allows us to often be more accurate in what we are looking for. And these improvements frequently dial the effects downward.
But the decline effect is not only due to measurement. One other factor involves the dissemination of measurements, and it is known as publication bias. Publication bias is the idea that the collective scientific community and the community at large only know what has been published. If there is any sort of systematic bias in what is being published (and therefore publicly measured), then we might only be seeing some of the picture.
The clearest example of this is in the world of negative results. If you recall, John Maynard Smith noted that “statistics is the science that lets you do twenty experiments a year and publish one false result in Nature.” However, if it were one experiment being replicated by twenty separate scientists, nineteen of these would be a bust, with nineteen careers unable to move forward. Annoying, certainly, and it might feel as if it were a waste of these scientists’ time, but that’s how science operates. Most ideas and experiments are unsuccessful. Scientists recognize that the ups and downs are an inherent part of the game. But crucially, unsuccessful results are rarely published.
However, for that one scientist who received an erroneous result (and its associated wonderfully low p-value), there is a great deal of excitement. Through no fault of his own—it’s due to statistics—he has found an intriguing phenomenon, and quite happily publishes it.
However, should someone else care to try to replicate his work, she might find no effect at all, or if there is one, it is likely to be smaller, if the experiment should never have been a success in the first place.
Such is the way of science. A man named John Ioannidis is one of the people who has delved even deeper into the soft underbelly of science, in order to learn more about the relationship between measurement and error.
• • •
JOHN Ioannidis is a Greek physician and professor at the University of Ioannina School of Medicine, and he is obsessed with understanding the failings and more human properties of the scientific process. Rather than looking at anecdotal examples, such as the case of Pluto, he aggregates many cases together in order to paint a clearer picture of how we learn new things in science. He has studied the decline effect himself, finding its consistent presence within the medical literature. He has found that for highly cited clinical trials,13 initially significant and large effects are later found to have smaller effects or often no effect at all in a nontrivial number of instances.
Looking within the medical literature over a period of nearly fifteen years, Ioannidis examined the most highly cited studies. Of the forty-five papers he examined, seven of them (over 15 percent) initially had higher effects, and another seven were contradicted outright by later research. In addition, nearly a quarter were never even tested again, meaning that there could have been many more false results in the literature, but since no one’s tested them, we don’t know.
The research that had initially higher effects ranged across many areas of study. From treatment of HIV to angioplasty or strokes, none of these areas were immune to the decline effect. And, of course, a similar range of areas was affected by contradictions: coronary artery disease, vitamin E research, nitric oxide, and more. As the saying among doctors goes, “Hurry up and use a new drug while it still works.”
What was the cause of the decline effect here? Did Ioannidis ascribe this to anything new? Far from being the result of anything spectacular or confusing, the decline effect often comes down to a matter of replication and importance. The more something is tested, the better we understand it. Often, more important areas are those that are tested more frequently. It is likely that there are a good deal more incorrect effects out there in the medical literature than we are even aware of, just waiting to be tested.
Of course, it’s not always this clear. As Ioannidis noted:
Whenever new research fails to replicate early claims for efficacy or suggests that efficacy is more limited than previously thought, it is not necessary that the original studies were totally wrong and the newer ones are correct simply because they are larger or better controlled. Alternative explanations for these discrepancies may include differences in disease spectrum, eligibility criteria, or the use of concomitant interventions.
We should be wary of jumping to conclusions.
Nevertheless, in consonance with the idea of increasing precision and p-values, Ioannidis wrote:
In the case of initially stronger effects, the differences in the effect sizes could often be within the range of what would be expected based on chance variability. This reinforces the notion that results from clinical studies, especially early ones, should be interpreted using not only the point estimates but also the uncertainty surrounding them.
More recently, Ioannidis conducted the same test for various biomarkers14 and found that subsequent meta-analyses often found diminished effects. We must always be aware of the fact that we are dwelling in uncertainty. Forgetting can make us jump to unwarranted conclusions.
• • •
THESE contradicted effects are related to what is perhaps Ioannidis’s most well-known paper,15 which has acted as a sort of broadside on many aspects of how science is done. His 2005 paper in the journal PLoS Biology was titled “Why Most Published Research Findings Are False.” As of late 2011, it has been viewed more than four hundred thousand times and cited more than eight hundred times.
He lays out very clearly a mathematical argument for why many scientific claims are untrue. Elaborating on several of the themes already discussed, what he looks for are situations in which there are cases of false positives, instances where a finding is “discovered” even though it’s not actually real.
In a wonderful bit from The Daily Show, correspondent John Oliver interviews Walter Wagner, a s
cience teacher who tried to prevent, via lawsuit, the Large Hadron Collider from being turned on. The Large Hadron Collider is a massive particle accelerator capable of generating huge amounts of energy, and Wagner was concerned that it could create a black hole capable of destroying the earth.
When Oliver presses Wagner on the chances that the world will be destroyed, he states that “the best we can say right now is about a one in two chance.” Wagner bases this on the idea that it will either happen or it won’t, so therefore it must be 50-50.
But this is absurd. Prior to the testing of a hypothesis, there is a certain expectation of what might happen. As another scientist interviewed by The Daily Show stated, there is a 0 percent chance of the earth being destroyed, based on what we already know about the fundamental laws of physics and how particle accelerators work.
This probability—what we expect to occur when we test a hypothesis—is known as the prior probability. The prior probability is simply the probability that the hypothesis is true prior to testing. Once we’ve tested it, we then get something known as a posterior probability: the probability that it is true, after our test.
Ioannidis argues that in a given field there is a certain fraction of relationships between variables that are real but many more that are spurious. For each field, then, there is a ratio of the relationships that are real to those that aren’t. Think of it as the ratio between smoking-causes-cancer hypotheses and green-jelly-beans-cause-acne hypotheses.
Ioannidis then uses this ratio, along with something known as our hypothetical experiment’s discriminating power—a number that encapsulates the ability of the experiment to actually yield a positive result—to calculate whether the experimental result is valid.
Essentially, in a quantitative way, he shows that in a large number of situations—whether due to the study being done in a field in which the above ratio is fairly low, implying that the probability of a spurious relationship is high, or an experiment using very few subjects, or the study was done in an area where replication of results doesn’t occur—statistically significant and publishable results can occur, even though they are actually not true.
Ioannidis helpfully provides a few corollaries of his analysis that grow out of common sense, and I’ve added my own annotations:
The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. If a study is small, it can yield a positive result more easily due to random chance. This is like the classic clinical trial joke, in which, upon testing a new pharmaceutical on a mouse population, it was reported that one-third responded positively to the treatment, one-third had no response, and the third mouse ran away.
The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. If an effect is small, it could be like Planet X, and we are simply measuring noise.
The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. More experiments mean that some of them might simply be right due to chance, and get published.
The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. If there’s a greater possibility of massaging the data to get a good result, then there’s a greater chance that someone will do so.
The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Since scientists are people too, and are not perfect beings, the greater the possible bias, the greater the chance the findings aren’t true.
The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true. More teams mean that any positive result gets a great deal of hype quite rapidly, and is pushed out the door quickly, but leads to research that can be easily refuted, with an equal amount of hype. Ioannidis refers to this as a cause of the Proteus phenomenon, which he defined as “rapidly alternating extreme research claims and extremely opposite refutations.”
• • •
ONE simple way to minimize a lot of this trouble is through replication, measuring the same problem over and over. Too often it’s much more glamorous to try to discover something new than to simply do someone else’s experiment a second time. In addition, many scientists, even those who want to replicate findings, find it difficult to do so. Especially when they think a result is actually wrong, there is even more of a disincentive.
Why is this so? Regarding a kerfuffle16 about the possibility of bacteria that can incorporate arsenic into their DNA backbone—a paper published in Science—Carl Zimmer explains:
But none of those critics had actually tried to replicate the initial results. That would take months of research: getting the bacteria from the original team of scientists, rearing them, setting up the experiment, gathering results and interpreting them. Many scientists are leery of spending so much time on what they consider a foregone conclusion, and graduate students are reluctant, because they want their first experiments to make a big splash, not confirm what everyone already suspects.
“I’ve got my own science to do,” John Helmann, a microbiologist at Cornell and a critic of the Science paper, told Nature.
Or to put it more starkly, as Stephen Cole, a sociologist of science at the State University of New York, Stony Brook, quoted one scientist, “If it confirmed the first researcher’s findings,17 it would do nothing for them [the team performing the replication], but would win a Nobel Prize for him, while on the other hand, if it disconfirmed the results there would be nothing positive to show for their work.”
But only through replication can science be the truly error-correcting enterprise that it is supposed to be. Replication allows for the overturning of results, as well as an approach toward truth, and is what science is ultimately about. In a paper that followed up on Ioannidis’s somewhat pessimistic conclusion, researchers calculated that a small amount of replication18 can lead us to much more robust science. But how do we do this?
A number of scientists are trying to make it more acceptable, and easier, to publish negative results. Since science prioritizes the exciting and the surprising, it is nearly impossible to publish a paper that says that some hypothesis is false. In fact, unless the work overturns some well-known result or dogma, the publication will never receive a hearing. Many scientists are advocating for journals and databases devoted to publicizing negative results to fill this publishing void, and have begun such journals. These could act as a check on the positive results so often seen in the literature and help provide a handle on the nature of the decline effect. In addition, they have the potential to act as a series of guideposts for other scientists, allowing them to see what hasn’t worked before so they can steer clear of unsuccessful research.
• • •
SCIENCE is not broken. Lest the above worry the reader, science is far from a giant erroneous mass. But how do we return from the brink, where error and sloppy results might appear to be widespread?
Luckily, many of the erroneous and sloppy aspects of science are rare. While they do occur in a few instances, science as a whole still moves forward.
As Lord Florey, a president of the Royal Society, stated:19
Science is rarely advanced by what is known in current jargon as a “breakthrough,” rather does our increasing knowledge depend on the activity of thousands of our colleagues throughout the world who add small points to what will eventually become a splendid picture much in the same way the Pointillistes built up their extremely beautiful canvasses.
Science is not always cumulative,20 as the philosopher of science Thomas Kuhn has noted. There are setbacks, mistakes, and wrong turns. Nonetheless, we have to distinguish the core of science from the frontier, terms used by SUNY Stony Brook’s Stephe
n Cole. The core is the relatively stable portion of what we know in a certain field, the facts we don’t expect to change. While it’s no doubt true that we will learn new things about how DNA works and how our genes are turned on and off, it’s unlikely that the basic mechanism of encoding genes in DNA is some sort of mesofact. While this rule of how DNA contains the information for proteins—known as the central dogma of biology—has become more complex over time, its basic principles are part of the core of our knowledge. This is what is generally considered true by consensus within the field, and often makes its way into textbooks.
On the other hand, the frontier is where most of the upheaval of facts occur, from the daily churn in what the newspapers tell us is healthy or unhealthy, to the constant journal retractions, clarifications, and replications. That’s where the scientists live, and in truth, that’s where the most exciting stuff happens. The frontier is often where most scientists lack a clear idea of what will become settled truth.
As John Ziman, a theoretical physicist who thought deeply about the social aspects of science, noted:
The scientific literature is strewn21 with half-finished work, more or less correct but not completed with such care and generality as to settle the matter once and for all. The tidy comprehensiveness of undergraduate Science, marshalled by the brisk pens of the latest complacent generation of textbook writers, gives way to a nondescript land, of bits and pieces and yawning gaps, vast fruitless edifices and tiny elegant masterpieces, through which the graduate student is expected to find his way with only a muddled review article as a guide.
And pity the general public trying to make sense of this.
The errors at the frontier are many, from those due to measurement or false positives, to everything else that this book has explored. But it’s what makes science exciting. Science is already a terribly human endeavor, with all the negative aspects of humanity. But we can view all of this uncertainty in a positive light as well, because science is most thrilling and exciting when it’s unsettled.
The Half-Life of Facts Page 17