Ending Medical Reversal
Page 14
:: PREMATURE TERMINATION OF TRIALS
Moving on from error rates, the early termination of trials is another factor that can cause even the best RCTs to complicate the process of proving causality. Imagine you and a friend are playing basketball. He tells you that he is a perfect free-throw shooter.
“What do you mean perfect? Like 90 percent?” you ask.
He shakes his head. “100 percent.”
“Come on, that’s impossible,” you say, knowing that the highest single-season free-throw percentage in NBA history was 98 percent.* There is no way your friend is better than that. “Prove it!”
Your friend steps up to the line and shoots—he makes it. He shoots again—he makes it. One more time—he makes it.
“There you have it,” he says.
“Are you kidding me? You made three shots?”
“Three out of three is 100 percent,” he says, and walks away.
You are not satisfied.
So, what happened? The problem here was early termination. By dumb luck nearly anyone can make three free throws in a row, but that does not make the person José Calderón. The same is true in clinical trials. An intervention can look exceptionally good, especially if there are only a few outcomes, but if you keep collecting data, the real benefit will become clear.
Why are clinical trials terminated early? We consider it to be unethical to continue a trial once we know that the intervention we are testing is clearly superior (or clearly inferior) to the control. It is no longer fair to the people in the control group (or treatment group) who were recruited with the understanding that we did not know which treatment was superior. However, this perceived ethical necessity can impede the quest for truth. One recent analysis looked at 63 clinical questions for which we had hundreds of studies. Of those, 91 studies were terminated early, while 424 matching studies were completed. The researchers found that studies of an intervention that were stopped early claimed larger benefits of the treatment than those studies of the same intervention that ran to a preplanned conclusion. On average, for a given clinical question, if a trial was run to its conclusion and found no effect, a study of the same intervention that was stopped early concluded the treatment was associated with a 29 percent benefit. Just like our basketball shooter, a few lucky breaks, early on, can provide a misleading outcome.
:: META-ANALYSIS
If a single, well-done RCT is good, is the combination of multiple RCTs better? A meta-analysis is a study that quantitatively combines multiple RCTs. Meta-analyses are useful in many ways. They summarize the data from multiple RCTs that study a single topic. They are especially useful when multiple small RCTs have been done but no one single study was powerful enough to reach a conclusion. Sometimes combining these studies can reach a clear conclusion. The disadvantage of meta-analyses is that they study studies rather than patients. Researchers must therefore make choices about which studies to include in their analysis and open themselves to biases inherent in the publication process—not every study has an equal chance of being published.
So which is better, a single, well-done RCT or a meta-analysis of the same question? It would seem that, given the great size, in terms of patients, of a meta-analysis and its ability to consider a more diverse group of patients (owing to the fact that it will include multiple, diverse RCTs), a meta-analysis might win out. This issue was first studied in the late 1990s. A group of researchers compared the results of 12 large randomized trials—with more than 1,000 patients in each arm—with 19 meta-analyses that addressed the same topics but were published prior to the large RCTs. They found significant disagreement between the two methods. If the meta-analysis was positive, only 68 percent of the RCTs were also positive; and if the meta-analysis was negative, 67 percent of the RCTs were negative. Nearly one-third of the time, the results of the meta-analysis and the RCTs were at odds.
When a large, well-done RCT reaches one conclusion and a meta-analysis of the same topic comes to a different one, which is right? Individual RCTs can have exaggerated results because of early termination or idiosyncrasies in the population studied. Thus, if just one or two individual trials are positive and the meta-analysis is negative, we tend to have doubts that the treatment is effective. Conversely, if an excellent and large randomized trial is negative but a meta-analysis is positive, we would be concerned that there is bias at play in the meta-analysis, most likely publication bias. Publication bias occurs because small positive RCTs are more likely to be published than small negative ones. Thus, the medical literature as a whole tends to be biased toward positive results, and meta-analyses will reflect this bias. Finally, if several medium-sized randomized trials and a meta-analysis agree that a treatment is beneficial, then that finding is probably really true.
THE BEST THERE IS
Evidence showing cause and effect in medicine is complicated. There probably are few parachutes (or saviors pulling us from the path of a bus) in medicine. Because of this, we need randomized controlled trials—true experiments that, when done well, prove causation and provide near certainty about treatments. But even RCTs are not perfect, and we will never have RCTs to provide evidence for every decision that needs to be made in medicine. In fact, one reversal we do not talk about extensively—the use of recombinant activated protein C in sepsis—gained prominence based on one RCT but was contradicted by another one. That said, we have come a long way in terms of basing our management decisions on science. As of 2015, thousands of medical practices have been rigorously tested, including some that would have seemed unthinkable to test just 20 years ago.
We end this chapter, which is dedicated to promoting the singular power and importance of the RCT, with a sobering thought: as doctors, the best we can hope for is that, on average, we improve the health of our patients. As patients, the best we can hope for is that a therapy we are offered has a reasonable likelihood of helping us. Although these seem like modest goals, they are dictated to us by evidence-based medicine. It is irrational to expect certainty that every intervention will help. Doctors like to think they help every patient they meet, and patients expect certainty when prescribed a treatment. The practice of evidence-based medicine can increase our confidence in medicine—when a treatment has been shown to improve survival for a group of patients in a large, well-done RCT, doctors should recommend it and patients should accept the recommendation. But at the same time, we know that some subset of those patients—a subset that we have no idea how to recognize—will not benefit.
Making the world a better, healthier place, on average, is the best we can hope for. This modest goal is an admirable one. However, when medical reversal occurs, it is not that a subset of patients did not benefit, it is that doctors did not make the people, on average, healthier; and sometimes they made them worse. On the other hand, when a practice is shown to work in a large randomized trial, although not everyone benefits, people do benefit, on average. This is not trivial. In human history, there are few things we have done that we know brought about net good.
PART 3 THE ORIGINS OF REVERSAL
11 SCIENTIFIC PROGRESS, REVOLUTION, AND MEDICAL REVERSAL
EVERY MEDICAL SCHOOL INTERVIEW follows the same script. The interviewer, a doctor taking time out from his practice or research, asks some version of “Why do you want to be a doctor?” The applicant, suited and nervous, responds, “I am interested in science and I want to help people.” The savviest applicants make this response sound more profound, but the meaning is always the same. And why not? Medicine is a wonderful field because it offers the opportunity to apply science toward an end that is pure and good. Because of the intricate link between basic science and clinical medicine, medical progress depends on the scientific method. The scientific method, however, when used in medicine, can lead us to adopting therapies before their time. When this happens, the stage is set for medical reversal.
Before we consider the scientific method and its relationship to medical reversal, let us quickly orient ourselve
s. Up to this point we have presented the “what” and the “why” of medical reversal. The “what” has been the many examples of medical reversal. The “why” has been the reasons these faulty therapies were adopted in the first place. Every example really had the same cause: the therapy in question was founded on flawed data. We began in chapter 1 discussing medical therapies that were overturned because their use was based on inadequate data—observational studies or mechanistic explanations alone. When these treatments were actually assessed in real, experimental trials, it turned out they were ineffective. In chapter 2, when we discussed subjective end points, we showed how poorly designed studies, those that used inappropriate placebos, have also led to reversal. Chapter 3 extended the list of causes as we covered surrogate end points. Here, evidence was based on end points that are, in themselves, unimportant. When we discussed systems initiatives in chapter 5, we added other types of flawed data, those based on historical controls or studies conducted at a single center.
In part 2 we went even further in explaining the “why.” We explained why randomized controlled trials provide our best evidence but admitted that not every randomized controlled trial provides trustworthy evidence. Randomized controlled trials have intrinsic error rates and can become biased when they are stopped early. There is also publication bias, which makes positive studies more likely to be published than equally worthy negative ones. All of these factors increase the likelihood that we will adopt ineffective therapies.
In part 3 we will delve a little deeper into the causes of medical reversal. It turns out that faulty data can be pretty interesting. Sometimes inadequate data exist because of the way the scientific method proceeds, and other times we have inadequate data because of malfeasance. Before we get into juicy stories of manipulated data, let us consider how the scientific method itself can predispose us to reversal. Any proper exploration of this point requires a discussion of the scientific method and Thomas Kuhn. Finally, we end this chapter with another historical lesson, one drawn from a sociologist of medicine whose words more than 30 years ago seem prophetic today.
THE SCIENTIFIC METHOD AND SCIENTIFIC REVOLUTIONS
The scientific method is, at its most basic, a way of determining how the world works. We begin with knowledge in which we are confident; we formulate a question about something we do not yet understand; we hypothesize an answer to that question; and then we design an experiment, an opportunity to observe the world in a controlled and structured way, that will tell us whether our hypothesis is true or false. When done well, the scientific method allows us to extend our knowledge. This is the backbone of biomedical research. For example: we know that our cells have proteins that signal them to grow and proliferate; we also know that some types of cancers produce too many of these proteins; we hypothesize that a drug that disrupts these proteins will slow cancer growth; finally, we design an experiment to see whether our drug works—either on cells in a test tube or in actual people.
But science does not march forward in a clean and orderly way, with each experiment getting us closer to profound and all-encompassing truth. Science proceeds in revolutions. This is the provocative thesis of Thomas Kuhn’s book The Structure of Scientific Revolutions. First published in 1962, Kuhn’s book quickly became influential in the field of the philosophy of science. Although his work has fallen in and out of fashion and has received its share of criticism, it is instructive in understanding how the scientific method can predispose us to medical reversal.
In The Structure of Scientific Revolutions Kuhn argued that science proceeds in a series of well-accepted worldviews, or what Kuhn called “prevailing paradigms.”* Kuhn believed that there were three key periods in any scientific inquiry. First, there is the “pre-paradigmatic period.” During this time scientists lack an organizing theory, or story, to unite their observations. Experiments are haphazard, and theories are proposed only to be quickly abandoned. Usually, many competing theories for how things work are entertained at the same time. Over time, however, one theory or a group of related theories begin to gain traction and coalesce into a broader, explanatory model.
Once a paradigm is accepted, we enter a period called “normal science.” During normal science, scientists conduct experiments that reinforce the prevailing paradigm. Many people (and we are among them) think that Kuhn was a bit hard on the so-called normal scientists who do the work during this period. Kuhn makes them seem plodding, doing obvious work. In fact, most of the science we encounter daily is normal science, and it is incredibly important. When a research team announces that they have discovered a gene that increases the risk of breast cancer, that is normal science. The findings are published in major scientific journals and may improve health care. They are not, however, revolutionary. We already know that one’s genes can increase or decrease the person’s susceptibility to disease, and it is normal science to apply this logic to a new problem. If the preparadigmatic period is like completing the straight border of a jigsaw puzzle, normal science is filling in the middle.
Sometimes, while normal science is being done, something disturbing happens: unexpected results occur. Kuhn labeled such results anomalies. An anomaly is an observation that does not quite fit the paradigm. Anomalies are not necessarily negative studies—a negative study can be just as likely as a positive one to support a paradigm. An anomaly is a finding that does not make any sense within the prevailing paradigm. Some anomalies can be incorporated into the paradigm, strengthening the overarching model, but sometimes anomalies accumulate and scientists have to take notice. This is the period Kuhn calls “revolutionary science.” Revolutionary science happens when anomalies become so plentiful that the entire paradigm reaches crisis and has to be abandoned. Scientific revolutions are infrequent, according to Kuhn, and this is a good thing. For the most part, paradigms make sense, unite our beliefs, and allow us to explore the natural world. Anomalies are rare events, and they only jeopardize a paradigm when the paradigm cannot be adjusted.
SCIENTIFIC REVOLUTIONS IN MEDICINE
The example that Kuhn most famously used, to illustrate his theory of anomalies and revolutionary science, was the transition from a geocentric theory of planetary motion, in which the planets revolve around the earth, to a heliocentric theory, in which the planets revolve around the sun. Kuhn identified the geocentric view, attributed to Ptolemy, as the prevailing paradigm for 1,500 years. The theory was complicated, but it did a reasonable job of explaining the locations of “heavenly objects.”
Then in 1543, Copernicus postulated that all planets revolve around the sun. His theory, however, assumed circular orbits, and as such it did not make better predictions than Ptolemy’s did. It also ran against the prevailing interpretation of scripture. Johannes Kepler later modified the theory, substituting elliptical orbits, and now the Copernican model made better predictions about where planets should be at any given time.
The anomalies to the Ptolomean paradigm began to accumulate. Most importantly, in 1610, Galileo observed that Venus has phases (like the moon does). This fact could only be reconciled with a heliocentric, and not a geocentric, model of the solar system. This example, though a good one, is a little different from the classic progression that Kuhn describes, in the sense that the new paradigm came a little before the anomalies—it usually is the other way around.
Medicine has seen an occasional revolution, a discovery that completely changes the way we understand health and disease. The introduction of the germ theory of disease was certainly one of those. The prevailing paradigm before germ theory was that (what we now know are infectious) diseases were transmitted by miasma. Miasma, or bad air, came from decomposed matter. A local epidemic, for example, would be blamed on the presence of miasma, since the idea that a disease could be passed from person to person had not yet been conceived.
There have been more recent revolutions in areas of medicine.* Until the 1980s, the prevailing paradigm was that peptic ulcer disease was a disease of overproductio
n of, or sensitivity to, acid in the gastrointestinal tract. There was clear evidence that certain exposures (extensive skin burns, cigarette smoking) made one more prone to ulcers, and there were treatments (antacids, surgery) that worked to heal the ulcers. However, there were anomalies that upset this paradigm. There were no convincing data that explained why some people got ulcers and others did not. And then there was this bacteria, probably first recognized in the 19th century, that seemed to live in the acidic environment of the stomach. In the 1980s Drs. Barry Marshall and Robin Warren were finally able to isolate the bacteria (Helicobacter pylori) and eventually prove its link to peptic ulcer disease.† A revolution: a disease that was thought to be caused by acid production alone and treated primarily with antacids now became an infectious disease treated primarily with antibiotics. Even today, though, the peptic-ulcer story remains open. We don’t fully understand why, of all the millions of people in the world who are infected with H. Pylori, some develop disease but others do not.
Medical research frequently produces anomalies. When we are surprised by the results of a study, it is often because the results run counter to our worldview. We thought we understood a disease process and based a hypothesis on this understanding. When the hypothesis is disproved, we are forced to reconsider our understanding. Only rarely do anomalies accumulate and culminate in a revolution. More commonly, our understanding of the disease process is tailored to accommodate the new data.