Taking the Medicine: A Short History of Medicine’s Beautiful Idea, and our Difficulty Swallowing It
Page 29
Exacerbating Cobbett’s feeling of outrage was the fact that on the same day that Washington died, Rush’s libel suit against him was concluded. ‘The times are ominous indeed,’ Cobbett’s lawyer said in court, ‘when quack to quack cries purge and bleed.’ The judge, a man whom Cobbett had attacked in print years previously, found against him. Rush was awarded damages of $5,000. Cobbett pointed out that this exceeded the combined sum awarded in every similar court case since the United States had been founded. It was enough to make him flee back to Britain to escape.
He lingered long enough before returning, however, to attack the medical care that Washington had received, and to bring evidence as well as opinion to bear on his criticism of Rush. Having abandoned the Porcupine’s Gazette, his new publication – pointedly called The Rush-Light – looked at the official registry of deaths in Philadelphia. After Rush began advertising his belief in heroic levels of bleeding, Cobbett noted, the mortality rate demonstrably rose.
The existence of other people holding different views from ourselves is a problem for all of us. Cobbett and Rush could have come together, recognised that one of them had to be wrong about the effects of bleeding and devised an experiment to settle their differences. Both were idealists, vigorous in their pursuit of many goals they held dear. Neither man doubted his ability to reason out truth without recourse to a test. It happens that Cobbett was right and Rush wrong but their unwillingness to put their beliefs to a trial matters more. ‘Ignorance is preferable to error,’ wrote Jefferson in his 1782 Notes on Virginia, ‘and he is less remote from the truth who believes nothing than he who believes what is wrong.’ It was a lesson that was not to the tastes of those who felt strong opinions were more manly than uncertainties and doubts.
A few years later, in 1803, Meriwether Lewis came to Rush in order to learn some medicine. He was sent by Jefferson, who wanted to prepare Lewis for his historic journey with William Clark across America to the Pacific North-west. Rush equipped Lewis with a half a pound of opium, drugs to induce vomiting, fifty dozen mercury-based laxative pills, a pound of mercury (to be taken orally or injected directly into the penis in the event of picking up a sexually transmitted disease), fifteen pounds of Peruvian bark and two pounds of an ointment made up of animal fat, beeswax and pine resin. The expedition used so much of Rush’s mercury that now, more than two centuries later, their rest stops are identifiable as the earth around the places they used as toilets is still so contaminated.
The moral is not that doctors once did foolish things. The moral is that even the best of people let themselves down when they rely on untested theories, and that these failures kill people and stain history. Bleeding and mercury have gone out of fashion, untested certainties and over-confidence have not.
25 The Beauty of Doubts
TO WHAT EXTENT are the things that doctors do today proven to be useful?
When Archie Cochrane interrupted a talk in New Zealand in 1976, to call his friend Kerr White ‘a damned liar’ for suggesting that any more than 10 per cent of medical interventions were based on good evidence, he was not pulling his statistics out of the Wellington air. The figure came from a 1963 paper in Medical Care reporting the results of a fortnight’s survey of nineteen family doctors in the north of England. They were asked to keep records of every prescription written over that period, and at the end of the time the drugs they prescribed were compared with the conditions they were trying to treat, and an attempt was made to determine how many were supported by reliable evidence. The figure came out as 9.3 per cent – inflated by Cochrane to a round 10.
Efforts to extend the degree to which medical practice is based on sound evidence have been going on – with stuttering success – throughout human history. The power of randomised controlled trials, and the extent to which most of what doctors did was not backed up by them, became increasingly apparent over the course of the twentieth century. Statistical work establishing the effectiveness of medical interventions was called ‘clinical epidemiology’ for most of that period. The name came to seem too obscure and off-putting for what was felt to be universally important and relevant. As a result, a different term appeared during the 1980s, emerging from medical teaching sessions at McMaster University in Canada. ‘Evidence-based medicine’ (EBM) first appears in the literature in a 1991 article in the Journal of the American Medical Association. These days the term is widely used. In its mocking tautology, ‘evidence-based medicine’ is clearly a propaganda term. It is a euphemism for a school of thinking that holds that certain types of evidence are generally more robust and valuable than others – experiments more than guesses, trials more than anecdotes, interventions more than observations.
Many doctors loathe the term ‘evidence-based medicine’, their hackles raised by its campaigning tone and its implication that they are doing something different. Arguments are frequently made that it is a movement that seeks experimental proof for the most ludicrous of things and in the most thoughtless of ways. A good example is the 2003 British Medical Journal paper by Gordon Smith and Jill Pell, entitled ‘Parachute Use to Prevent Death and Major Trauma Related to Gravitational Challenge’:
As with many interventions intended to prevent ill health, the effectiveness of parachutes has not been subjected to rigorous evaluation by using randomised controlled trials. Advocates of evidence based medicine have criticised the adoption of interventions evaluated by using only observational data. We think that everyone might benefit if the most radical protagonists of evidence based medicine organised and participated in a double blind, randomised, placebo controlled, crossover trial of the parachute.
In contrast, advocates of EBM seem happy to accept that interventions like parachutes are clearly helpful. A 1995 Lancet paper, ‘Inpatient general medicine is evidence based’, gives a good guide to the standards of evidence that are actually required by ‘the most radical protagonists of evidence based medicine’, as well as suggesting that medicine has improved since 1963. One of the paper’s authors was the Canadian doctor David Sackett, who has been amongst the foremost evangelists for the EBM movement. The paper looked at all the treatments given to patients coming under the care of Sackett’s team of physicians in a single month at the John Radcliffe Hospital in Oxford. Sackett commented:1
We found that a service that ran like ours and worked hard to find the best evidence to guide its interventions could treat 53% of its patients on the basis of SRs [systematic reviews bringing together combinations of high-quality trials] and RCTs [Randomised Controlled Trials], another 29% on the basis of convincing non-experimental evidence, and just 19% on the basis of guessing and hope.
Over 80 per cent of decisions being based on good verification is a stunning improvement, even in a medical team led by a physician with an avowed devotion to following evidence. Sackett gave an example of the sort of treatment that he felt did not require randomised controlled evidence before being accepted as true: giving an electric shock to someone’s heart when it has stopped beating. In medical terms, such a shock is analogous with using a parachute. There are rare cases of people surviving without either – falling 10,000 feet into trees and snow, or spontaneously having their heart recover a normal rhythm – but in general the intervention, be it a parachute or an electric shock, is required for survival.
The study sparked off a series of similar surveys in different medical settings and specialities. Two looked at the evidence base within the world of family doctors. The first, a study from Leeds (Gill et al.) was published in the British Medical Journal in 1996. Two days’ worth of consultations within a single family practice came up with similar figures to the Lancet paper, with 31 per cent of treatments being based on RCT evidence and 51 per cent on ‘convincing non-experimental evidence’. The same year, and also in the British Medical Journal, a group of Japanese family doctors led by Koki Tsuruoka reported the results of reviewing forty-nine of their own consultations (around half the number of both the Oxford and the Leeds study). Using
the same criteria for deciding on what was convincing proof as the other two studies, they found that 81 per cent of their treatments were based on good evidence.
Repeating the 1995 Lancet study in Kyoto University Hospital, Hiroshi Koyama and colleagues looked at how many of their treatment decisions were based on RCT evidence. Writing in the International Journal for Quality in Health Care in 2002, they looked at 211 different therapeutic interventions, finding 49 per cent of them to be supported specifically by RCTs, roughly the same as for Sackett’s hospital team in Oxford.
Other specialities have repeated these efforts to assess the extent to which their practices are based on evidence. In 2006, in a paper in the online journal BMC Women’s Health, which looked at obstetrics and gynaecology, Aamir Khan and others from Birmingham, England, reviewed 325 consecutive inpatients from 1998 and 1999, finding 42 per cent of the interventions the patients received to be based on RCTs.
A 1998 paper from Great Ormond Street Children’s Hospital suggested that contemporary paediatric surgery was less well founded on relevant research. Investigating a month of operations at this leading hospital, Baraldini and other surgeons concluded 26 per cent of seventy operations were founded on RCTs while 3 per cent fell into the category of being self-evidently helpful. That left another 3 per cent whose operations, in retrospect, appeared to go against all available evidence and 68 per cent without any adequate evidence base either way. An audit by ophthalmic surgeons in Hong Kong (Lai et al.), published in 2003 in the British Journal of Ophthalmology, found that 43 per cent of their 274 consecutive interventions in July 2002 were supported by RCTs, 34 per cent had the backing of poorer-quality observational evidence, and the remaining 23 per cent were either unsupported by any evidence whatsoever, or contradicted by it.
Other measurements of the extent to which medicine is now evidence-based have been similar to those given in the papers above. It is clear that we can be more confident in medical therapeutics today than we were in 1963; it is not only that our treatments have improved, it is also that we now have a much more certain knowledge of what their real effects are.
What about the estimates that some of these trials made of treatments that were ‘obviously’ true? What did they decide was so self-evident that it required no RCT evidence? The Kyoto team of Hiroshi Koyama declared forty-seven treatments to fit in this category, including (as at Oxford) the example of delivering an electric shock to someone after their heart has stopped. They also listed removing an appendix in a patient with appendicitis, giving oxygen to people in respiratory difficulty, watchful inactivity for those with glandular fever, blood-thinning with warfarin for people with deep-vein thromboses and the administration of insulin or thyroid hormones to those whose bodies have stopped producing them.
The 1996 family doctor study from Leeds (Gill et al.) also included giving thyroid hormones in their list of forty-three ‘interventions substantiated by convincing non-experimental evidence’. Other remedies seem similarly clear, like fluids for those who are dehydrated. But their list also included therapies that appear more immediately questionable, such as a range of specific antibiotics for particular infections. Some of these, like tonsillitis and chest infections, are likely to be viral rather than bacterial in origin – and although the antibiotics they mentioned were ones that are very safe, they, like all drugs, still sometimes have harms. (It is a reliable generalisation that the only drug with no side effects is one that does nothing at all.) Strong painkillers for back pain were also listed as unarguably good. This is likely to be true but, given that gentler analgesics might provide useful benefits with lower risks of serious harms (such as gastro-intestinal bleeding), it is questionable.
Contemporary mistakes about medical knowledge seem to come from two main directions. The first is the failure to properly test a hypothesis because it seems so obviously true. An excellent recent example is that of hormone replacement therapy (HRT). For decades post-menopausal women took hormones to replace those that their own bodies had stopped producing. The idea that this was good was based on theories of human physiology; it was reasonable to suspect that replacing in old age a set of hormones that were present in youth was helpful. Later, observations showed that women who took replacement hormones after their menopause actually did live longer and enjoy better health than women who did not.
The trouble was that these observations were taken as constituting an experiment. They did not. Women were not being randomly allocated to either taking the hormones or not taking them – they were choosing. That meant it was entirely possible that the sort of women who made one choice were different from those who made another. Nevertheless it took until 1993 for a relevant trial to be set up. The Women’s Health Initiative was an American study that enrolled over 160,000 post-menopausal women and allocated them to either HRT or placebo. In 2002 the trial was stopped early after the number of deaths from breast cancer was found to be higher than expected in the group taking HRT. British estimates suggest that HRT use in the UK alone was causing 2,000 extra cases of breast cancer a year. Despite this, the Women’s Health Initiative study had not actually been set up to test whether HRT was safe. It was established because doctors believed it would prove that HRT was saving lives.
The second mistake commonly still made is accepting trial evidence of the right kind, but which has not actually been carried out well enough to be reliable. Antidepressants are a good example. There have been hosts of studies, many of them randomised, double-blinded and controlled. But the studies have been undercut by being too small, too short, too badly designed and too vulnerable to being misrepresented by those with vested interests. That such trials continue to be accepted, both by governments and doctors, comes from the failure of both to understand the nature and importance of a good evidence base.
A bad study is clearly untrustworthy, such as those carried out in the early days of thalidomide, which made no serious attempts to objectively assess the drug. When it comes to depression, there are a host of drugs available, many only very slightly different from one another. What we ideally want to know is the precise effect of every one of these drugs compared to every other one, over a long period and in relation to the effects that most matter to people – in this case whether they are helped to be safe, healthy and happy.
Drug companies fund trials only as much as they need in order to persuade doctors to prescribe a treatment and governments to allow it. That leads to problems. A survey of the evidence behind twelve different antidepressants appeared in the New England Journal of Medicine in 2008 (Turner et al.). It looked at the trial data that drug companies submitted to the FDA when applying for regulatory approval, and compared that to the data that were eventually published and available for public view. Drug companies are obliged to register clinical studies with the FDA, and to submit results regardless of what they show. They are not obliged to publish them. The paper found seventy-four relevant studies covering over 12,000 patients. ‘Studies viewed by the FDA as having negative or questionable results’, said the paper, ‘were, with 3 exceptions, either not published (22 studies) or published in a way that, in our opinion, conveyed a positive outcome (11 studies). According to the published literature, it appeared that 94% of the trials conducted were positive. By contrast, the FDA analysis showed that 51% were positive.’ The authors felt unable to conclude whether this difference in what was publicly presented – an effect called ‘publication bias’ – was because drug companies put forward only their most favourable results, or because medical journals were uninterested in printing studies showing treatments had no compelling effects. The drugs approved by the FDA all showed benefit when all relevant results were brought together, but the paper found that these benefits were not accurately presented to the medical profession – ‘selective publication’, they concluded, ‘can lead doctors to make inappropriate prescribing decisions that may not be in the best interest of their patients’. An earlier 2004 paper (by Whittington et al.) in the Lance
t found the same when looking at selective serotonin reuptake inhibitors (SSRIs), a class of antidepressant drug, when used for depression in children. The effectiveness of the drugs appeared quite different when unpublished drug company trials were combined with those that had been published – benefits had appeared to outweigh harms, but adding in the extra data suggested the opposite.
Are these effects important? A 2004 study (by An-Wen Chan and others) in the Journal of the American Medical Association, said yes. Chan’s team looked at clinical trials approved between 1994 and 1995 in Denmark, then followed up the way they had been presented. Between the time that they received ethical approval and the time they published their results, almost two thirds of trials (62 per cent) changed what it was they said they were chiefly measuring – an excellent way of adjusting any trial to make it reach the conclusion you want. (By statistical convention, a finding is regarded as numerically significant if there is less than one in twenty chance of it happening by luck alone. Therefore for every twenty tests you perform, one is likely to appear positive when it represents nothing other than the play of chance. A good study will declare from the beginning the main thing it is aiming to test, and stick to it.) ‘The reporting of trial outcomes is not only frequently incomplete,’ found Chan’s study, ‘but also biased and inconsistent with protocols. Published articles, as well as reviews that incorporate them, may therefore be unreliable and overestimate the benefits of an intervention.’ They wanted the regulatory rules changed to force researchers to publish their results in a more accurate and comprehensive manner.