Another morning, you are running late to work. Just as you get to your car, you realize you forgot your briefcase (or your keys, or your phone, or your wallet). You run back in to get it and are now five minutes later. As you drive to work, you see a terrible three-car accident. Steam hisses from an engine. People stand around looking shaken. It clearly just happened— maybe five minutes ago. Had it not been for the briefcase, would that have been you?
We begin this chapter with stories of cause and effect to get you thinking about when these relationships are clear and when they are not. In many fields, think foreign policy or finance, it is almost impossible to tease out what would have happened because of the presence or absence of some action. Of course, this complexity does not stop experts from making definitive proclamations. Every day you hear that the stock market went up (or down) because of some report or event. If the relationship were really that predictable, would anybody lose money in the market?
In medicine, it turns out that cause-and-effect relationships are extraordinarily difficult to sort out, probably because the effects of most of our interventions are almost always subtle and complex. For this reason, we argue that only randomized trials can sort out relationships. Among medical studies, only RCTs are experiments, experiments in which you compare identical groups, varying only a single factor, that can delineate whether your intervention has made a difference. But you do not need a randomized trial to determine whether pulling someone out of the way of an oncoming bus is beneficial—it is. However, you do need a randomized trial to determine whether requiring hospital workers to wear gowns and gloves decreases the rate of multidrug-resistant bacterial infections, because there are so many factors contributing to the occurrence of such infections.
This position regarding the necessity of RCTs may seem overly dogmatic. Critics of our position love to cite a tongue-and-cheek article published in the British Medical Journal in 2003. In the article, the authors set out to review all randomized controlled trials examining the benefit of parachutes for skydivers facing “gravitational challenge.” Not surprisingly, they found no such trials. The authors noted that free fall is not universally fatal, since some cases of survival have actually been reported. Thus they concluded that the evidence was insufficient to draw firm conclusions about the benefits of parachutes. Because of growing pressure to “medicalize” free fall, the authors closed by calling for a randomized, controlled trial. Noting that it might be hard to recruit people for such a trial, they suggested that “those who advocate evidence-based medicine and criticise use of interventions that lack an evidence base will not hesitate to demonstrate their commitment by volunteering for a double blind, randomised, placebo controlled, crossover trial.”
This article was both brilliant and funny and could be republished today as a response to this book. It is often cited when critics argue that we do not need randomized trials for every intervention. But jumping from a plane is very different from taking your blood-pressure medication. The use of parachutes when skydiving is more analogous to the Samaritan pulling you out of the path of a moving bus—the benefit is undeniable. Are there analogous situations in medicine? A case in medicine of an obviously beneficial intervention? Perhaps, appendectomy for acute appendicitis?
TREATMENTS THAT SEEM OBVIOUS
Appendicitis is one of those diseases that most of us have had experience with—either personally or through a friend or family member. The appendix is a vestigial structure, a small pouch that comes off the first portion of our large intestine. Appendicitis is inflammation of this structure caused by a bacterial infection. It is a painful condition that, if not treated, can cause a life-threatening infection of the entire abdomen. Since the 1800s the treatment for appendicitis has been prompt surgical removal of the appendix, appendectomy. If there is an infection in a structure you do not need, you should take it out. Appendectomy was believed to be like a parachute. A person who has an appendectomy remembers pain, and often fever, up until the moment the anesthesia is given. When she wakes up, she feels much better. Appendectomy is the most common emergency surgical procedure in the United States. This is one intervention that many claimed would never be tested in a randomized trial. And yet, we now have four randomized trials that compared appendectomy to antibiotics.
These four trials studied more than 900 patients randomized to either antibiotics (with surgery reserved for those who got worse) or surgery. The results showed that using antibiotics to treat appendicitis may be better than taking people right to surgery. More than 60 percent of the patients did fine with antibiotics alone and did not require subsequent surgery. About 35 percent of the patients who started with antibiotics ended up having the appendectomy. The rates of the life-threatening outcomes, those that made us think surgery was the only option, as well as time in the hospital, are comparable between the two groups: surgery first or antibiotics. The authors of a review of this topic concluded that the best initial strategy for acute, run-of-the-mill appendicitis is antibiotics alone.
In one way the story of appendectomies for appendicitis is just one more example of treatments that seem like they should work until they are proved to not work—or in this case, to not work any better than something less expensive and less invasive. But in another way, it has always seemed especially clear that appendectomies were the best treatment. So if appendectomy for appendicitis is not analogous to the parachute for skydivers, then what is? How about interventions whose benefit seems enormous?
TREATMENTS WITH LARGE BENEFITS
If the magnitude of a benefit is really huge, as in the case of parachutes— you are much, much more likely to survive a jump out of a plane with a parachute than without one—then maybe a randomized controlled trial is not necessary. Consider two problems with this supposition. First, interventions with a sizable magnitude of benefit are rare in medicine. Second, just because an intervention seems to confer a large benefit does not mean that it works.
How common in medicine are interventions that yield tremendous benefits? Tiago Pereira, Ralph Horwitz, and John Ioannidis tackled this question in 2012 in their paper entitled “Empirical Evaluation of Very Large Treatment Effects of Medical Interventions.” Pereira and colleagues asked a simple question: if you look at medical trials, how many show very large treatment effects? The authors defined a large treatment effect as a fivefold improvement in an outcome (meaning that, for example, if without treatment you had a 10 percent chance of dying, with treatment that risk should fall to less than 2 percent). In their study they looked at the results of more than 228,000 trials. They found that only 9 percent of the trials demonstrated a very large treatment effect. The topics with large effects were less likely to be about mortality and more likely to be about a laboratory value—it is hard to get a large effect for what really matters. The authors then looked at other studies that addressed the same questions as those that demonstrated the very large treatment effect. Strikingly, they found that 90 percent of the time, the large treatment effects got smaller when you looked at the other studies on the same question. This means that results demonstrating the largest magnitude of effect are more likely statistical flukes than truly important findings. Across all the studies, only one intervention had a large effect on mortality: extracorporeal membrane oxygenation (ECMO), a method to oxygenate the blood of newborns who cannot adequately breathe on their own.* In other words: parachutes are uncommon in medicine.
Accepting that treatments with large benefits are rare in medicine today, the next question becomes, Is there ever uncertainty in the effectiveness of treatments that seem to provide large benefits? If someone proposes a treatment, like ECMO, which leads to survival that far exceeds what we have seen historically, should we just accept it, or is there still the potential for uncertainty? The Milwaukee Protocol for the treatment of rabies is a good example here. The teaching in medical school is that rabies, a virus usually transmitted from the bite of an infected animal (bats, raccoons, dogs), is fatal 100 perce
nt of the time. If you contract it, and you do not get the vaccine before you become symptomatic, you will die. Then in 2005, a 15-year-old girl in Milwaukee developed symptomatic rabies. She was treated with an experimental cocktail (now called the Milwaukee Protocol) of anesthetics and antivirals and survived. This treatment potentially offered a benefit of infinite magnitude. One might ask, why study this?
Of course the real story is complicated. First, it became clear that use of the Milwaukee Protocol did not always cure rabies. Some people died despite receiving this treatment. This was not actually surprising—as no one thought that a treatment for rabies would be 100 percent effective. More recently, research has shown that there may be quite a few people who survive rabies even without treatment. One survey of blood samples from a population in the Peruvian Amazon that has significant contact with vampire bats, revealed that 6 of 63 people tested showed evidence of having survived rabies. Maybe the Milwaukee girl survived despite the treatment.
Rabies is so rare (thankfully) that we will probably never have a randomized controlled trial of the Milwaukee Protocol, but the example expands the argument that parachutes are rare in medicine. Few treatments promise enormous benefits, and even when they do, the cause-and-effect relationship between treatment and benefit, or the balance of risks and benefits, may be complicated.
This present situation, in which few, if any, treatments are of overwhelmingly obvious benefit, has not always existed. It is probably a symptom of our success. Ninety years ago we did not need an RCT to show that penicillin cured infections that did not previously get better, and we did not need an RCT to show that appendectomy was better than bed rest. Now, however, we need RCTs to show that the newest antibiotic is at least as good as our present ones and that our present antibiotics might be as good as that surgery we have relied on for years.
All that said, there have been some modern medical innovations that are so valuable that they really are like parachutes. One example is imatinib (Gleevec), which, almost overnight, revolutionized the treatment of a type of leukemia. Such innovations are exceedingly rare. Some say, Do not worry, more are coming. These people suggest that we are on the cusp of a new era of medical care and that the advances in this era will be so great, and so obvious, that RCTs will no longer be necessary.
PERSONALIZED MEDICINE
The source of this putative advance is genomic, also called personalized, medicine. In genomic medicine, doctors will tailor therapy specifically to our genetic makeup rather than extrapolating results from RCTs to an individual patient. Thus far, however, experience suggests that genomic medicine will not free us from the need for well-done, experimental studies.
A few years ago, research showed that a cancer drug called cetuximab improved outcomes for patients with colorectal cancer. Since then, oncologists have learned that there are actually two types of colorectal cancer patients—a group of people who benefit from cetuximab and a group of people with certain genetic mutations who do not benefit. The latter group not only does not live longer when treated with cetuximab; they are actually harmed by it. With each passing year, we are discovering new mutations that allow us to better sort the two groups.
Our point is not about cetuximab but about medicine in general, including genomic medicine. Even drugs with great benefit, tailored to a specific mutation and proved to work in randomized trials, may not benefit all the patients we treat. We may not currently be able to detect the small subset of patients who are harmed—not knowing what distinguishes them from those who benefit—and, unfortunately, we still may not be able to make these distinctions for another 20 years. You hear a lot about how personalized medicine will sort out this problem, but although genetic understanding will lead to improvements, the problem will still exist. This issue is a limitation of empirical science; it is a problem of subsets. It will always be impossible for a doctor to know if a particular treatment will help any single patient. All we will ever be able to say is that, on average, this treatment will help many of the patients we think it will.
WHEN RCTs COMPLICATE THE EVIDENCE
Most people agree that RCTs are the best way to prove causality in medicine. RCTs, however, are not magical. They do not always explain away all the complexities that exist in evidence. First of all, not all RCTs are created equal; a poorly done RCT may be worse than a lesser trial design done well. Also, in today’s world of industry-designed trials, there is certainly bias in how RCTs are run, analyzed, and reported. We talk more about these issues in chapter 12. But even putting aside these points, RCTs sometimes actually complicate evidence. Even when they are done well, by committed, unbiased researchers who are seeking the truth, RCTs (or their combined data presented in meta-analyses) can lead us astray. It is worth considering three specific ways that RCTs may complicate, rather than clarify, cause-and-effect relationships: accepted error rates; premature termination of trials; and meta-analysis.
:: ERROR RATES IN RCTs
When you begin an RCT, you have two identical groups, one to which you will give the treatment of interest and one to which you will give the placebo. At the end of the study, you analyze these populations to see if they remain the same with respect to the outcome of interest. Your study is considered “positive” if the two populations are now different (meaning that your intervention had an effect) or “negative” if the populations remain indistinguishable. For the rest of our discussion, to make this clearly applicable to medical studies, let us say that a positive study shows a benefit and a negative study does not. So where do statistics come in?
Because your study is only looking at a small number of people, not the entire population, your results will not always be correct—sometimes you will find a difference when there really is one, and sometimes you will not find a difference even when it does exist. There are actually four possible outcomes:
:: You might find a benefit/difference when there is one: a true positive result
:: You might find a benefit/difference when there is not one: a false-positive result
:: You might not find a benefit/difference when there is not one: a true negative result
:: You might not find a benefit/difference when there is one: a false-negative result
Statisticians love to show this in a 2×2 table, which is in fact a very clear way of looking at the situation (despite statisticians’ usual love of complexity). See table 10.1.
In designing studies—especially in figuring out how many subjects you will need—you need to choose the level of uncertainty that you will accept. A study without uncertainty would include everybody in the world, but given the cost and complexity of that study, researchers have generally agreed on a level of error they are willing to tolerate. Although any misleading result is bad, a false-negative result is the lesser of the evils. In medicine, a false negative means that we fail to identify an intervention that is helpful. This is bad, but it is not as bad as a false-positive result. A false-positive result would lead us to adopt a useless treatment. This contradicts our “First, do no harm” commitment. Traditionally, therefore, we set error levels as follows: a standard trial will fail to recognize 20 percent of treatments that are actually beneficial (false negatives) and will mistakenly identify 5 percent of ineffective interventions as being effective (false positives). These percentages are also shown in table 10.1.
TABLE 10.1
Why are we telli’ng you all this? These error rates become critically important when you try to figure out how likely it is that the studies you read are correct. It turns out that the likelihood that the positive study you read is a true positive, as opposed to a false positive, depends on how likely it was in the first place that the intervention you are studying would work. When we test a new intervention in an RCT, the most optimistic we can ever be that the intervention will be beneficial is 50 percent. This is because, ethically, we can only randomize people to a control group if there is true uncertainty that the treatment will help. We call this ethical princip
le equipoise, and it is a foundation of ethical clinical research. Table 10.2 is a 2×2 table representing studies that have a 50 percent likelihood of being positive. For simplicity the table considers 1,000 studies, 500 of which will study a treatment that is truly beneficial and 500 of which will study a treatment that will fail to live up to expectations. Consider first the beneficial treatments (the left column): we will miss identifying 20 percent, or 100, of them because of where we set the false-negative error rate when we designed the study. Now consider the ineffective treatments (the right column): we will mistakenly identify 5 percent, or 25, of them as being effective. So, in the best of all worlds, if you read 425 positive studies, 400 of them will be true positives and 25 will be false positives. This is a 94 percent accuracy rate for a positive study. Not bad.
TABLE 10.2
TABLE 10.3
The problem is that for most clinical investigations, there is less than a 50 percent chance that the treatment being evaluated is effective. In fact, the studies that get people most excited are the ones in which an intervention that they never thought would work turns out to be effective. Table 10.3 describes the situation when there is only a 10 percent chance that the proposed treatment is beneficial. In this scenario, if there are 125 studies reporting positive outcome, only 80 of them, or 64 percent, are correct. Thus, almost one-third of the treatments that you would adopt would actually be ineffective. These treatments would eventually turn out to be medical reversals. And remember, these are treatments that you would have adopted based on well-done RCTs, the best evidence available.
The situation is actually probably far worse. In 2005 John Ioannidis published a now-famous article entitled, “Why Most Published Research Is False.”* In addition to the unavoidable factors above, Ioannidis pointed out other issues that make a positive study even less likely to be true, including bias in how studies are designed (making them more likely to show a positive result), publication bias (positive studies are more likely to be published than negative ones), and other issues that markedly increase the false positive rate among published studies. We discuss some of these issues in more depth in chapter 12. In the end he concluded that the likelihood that any positive study represents a true finding is probably far less than 50 percent and that even the best randomized trial is only true 85 percent of the time.
Ending Medical Reversal Page 13