Siddhartha Mukherjee - The Emperor of All Maladies: A Biography of Cancer
Page 36
The ACS's massive campaign was called the Breast Cancer Detection and Demonstration Project (BCDDP). Notably, this was not a trial but, as its name suggested, a "demonstration." There was no treatment or control group. The project intended to screen nearly 250,000 women in a single year, nearly eight times the number screened by Strax in three years, in large part to show that it was possible to muscle through mammographic screening at a national level. Mary Lasker backed it strongly, as did virtually every cancer organization in America. Mammography, the "discarded procedure," was about to become enshrined in the mainstream.
But even as the BCDDP forged ahead, doubts were gathering over the HIP study. Shapiro, recall, had chosen to randomize the trial by placing the "test women" and "control" women into two groups and comparing mortality. But, as was common practice in the sixties, the control group had not been informed of its participation in a trial. It had been a virtual group--a cohort drawn out of the HIP's records. When a woman had died of breast cancer in the control group, Strax and Shapiro had dutifully updated their ledgers, but--trees falling in statistical forests--the group had been treated as an abstract entity, unaware even of its own existence.
In principle, comparing a virtual group to a real group would have been perfectly fine. But as the trial enrollment had proceeded in the mid-1960s, Strax and Shapiro had begun to worry whether some women already diagnosed with breast cancer might have entered the trial. A screening examination would, of course, be a useless test for such women since they already carried the disease. To correct for this, Shapiro had begun to selectively remove such women from both arms of the trial.
Removing such subjects from the mammography test group was relatively easy: the radiologist could simply ask a woman about her prior history before she underwent mammography. But since the control group was a virtual entity, there could be no virtual asking. It would have to be culled "virtually." Shapiro tried to be dispassionate and rigorous by pulling equal numbers of women from the two arms of the trial. But in the end, he may have chosen selectively. Possibly, he overcorrected: more patients with prior breast cancer were eliminated from the screened group. The difference was small--only 434 patients in a trial of 30,000--but statistically speaking, fatal. Critics now charged that the excess mortality in the unscreened group was an artifact of the culling. The unscreened group had been mistakenly overloaded with more patients with prior breast cancer--and the excess death in the untreated group was merely a statistical artifact.
Mammography enthusiasts were devastated. What was needed, they admitted, was a fair reevaluation, a retrial. But where might such a trial be performed? Certainly not in the United States--with two hundred thousand women already enrolled in the BCDDP (and therefore not eligible for another trial), and its bickering academic community shadowboxing over the interpretation of shadows. Scrambling blindly out of controversy, the entire community of mammographers overcompensated as well. Rather than build experiments methodically on other experiments, they launched a volley of parallel trials that came tumbling out over each other. Between 1976 and 1992, enormous parallel trials of mammography were launched in Europe: in Edinburgh, Scotland, and in several sites in Sweden--Malmo, Kopparberg, Ostergotland, Stockholm, and Goteborg. In Canada, meanwhile, researchers lurched off on their own randomized trial of mammography, called the National Breast Screening Study (CNBSS). As with so much in the history of breast cancer, mammographic trial-running had turned into an arms race, with each group trying to better the efforts of the others.
Edinburgh was a disaster. Balkanized into hundreds of isolated and disconnected medical practices, it was a terrible trial site to begin with. Doctors assigned blocks of women to the screening or control groups based on seemingly arbitrary criteria. Or, worse still, women assigned themselves. Randomization protocols were disrupted. Women often switched between one group and the other as the trial proceeded, paralyzing and confounding any meaningful interpretation of the study as a whole.
The Canadian trial, meanwhile, epitomized precision and attention to detail. In the summer of 1980, a heavily publicized national campaign involving letters, advertisements, and personal phone calls was launched to recruit thirty-nine thousand women to fifteen accredited centers for screening mammography. When a woman presented herself at any such center, she was asked some preliminary questions by a receptionist, asked to fill out a questionnaire, then examined by a nurse or physician, after which her name was entered into an open ledger. The ledger--a blue-lined notebook was used in most clinics--circulated freely. Randomized assignment was thus achieved by alternating lines in that notebook. One woman was assigned to the screened group, the woman on the next line to the control group, the third line to the screened, the fourth to the control, and so forth.
Note carefully that sequence of events: a woman was typically randomized after her medical history and examination. That sequence was neither anticipated nor prescribed in the original protocol (detailed manuals of instruction had been sent to each center). But that minute change completely undid the trial. The allocations that emerged after those nurse interviews were no longer random. Women with abnormal breast or lymph node examinations were disproportionately assigned to the mammography group (seventeen to the mammography group; five to the control arm, at one site). So were women with prior histories of breast cancer. So, too, were women known to be at "high risk" based on their past history or prior insurance claims (eight to mammography; one to control).
The reasons for this skew are still unknown. Did the nurses allocate high-risk women to the mammography group to confirm a suspicious clinical examination--to obtain a second opinion, as it were, by X-ray? Was that subversion even conscious? Was it an unintended act of compassion, an attempt to help high-risk women by forcing them to have mammograms? Did high-risk women skip their turn in the waiting room to purposefully fall into the right line of the allocation book? Were they instructed to do so by the trial coordinators--by their examining doctors, the X-ray technicians, the receptionists?
Teams of epidemiologists, statisticians, radiologists, and at least one group of forensic experts have since pored over those scratchy notebooks to try to answer these questions and decipher what went wrong in the trial. "Suspicion, like beauty, lies in the eye of the beholder," one of the trial's chief investigators countered. But there was plenty to raise suspicion. The notebooks were pockmarked with clerical errors: names changed, identities reversed, lines whited out, names replaced or overwritten. Testimonies by on-site workers reinforced these observations. At one center, a trial coordinator selectively herded her friends to the mammography group (hoping, presumably, to do them a favor and save their lives). At another, a technician reported widespread tampering with randomization with women being "steered" into groups. Accusations and counteraccusations flew through the pages of academic journals. "One lesson is clear," the cancer researcher Norman Boyd wrote dismissively in a summary editorial: "randomization in clinical trials should be managed in a manner that makes subversion impossible."
But such smarting lessons aside, little else was clear. What emerged from that fog of details was a study even more imbalanced than the HIP study. Strax and Shapiro had faltered by selectively depleting the mammography group of high-risk patients. The CNBSS faltered, skeptics now charged, by succumbing to the opposite sin: by selectively enriching the mammography group with high-risk women. Unsurprisingly, the result of the CNBSS was markedly negative: if anything, more women died of breast cancer in the mammography group than in the unscreened group.
It was in Sweden, at long last, that this stuttering legacy finally came to an end. In the winter of 2007, I visited Malmo, the site for one of the Swedish mammography trials launched in the late 1970s. Perched almost on the southern tip of the Swedish peninsula, Malmo is a bland, gray-blue industrial town set amid a featureless, gray-blue landscape. The bare, sprawling flatlands of Skane stretch out to its north, and the waters of the Oresund strait roll to the south. Battered by a steep re
cession in the mid-1970s, the region had economically and demographically frozen for nearly two decades. Migration into and out of the city had shrunk to an astonishingly low 2 percent for nearly twenty years. Malmo had been in limbo with a captive cohort of men and women. It was the ideal place to run a difficult trial.
In 1976, forty-two thousand women enrolled in the Malmo Mammography Study. Half the cohort (about twenty-one thousand women) was screened yearly at a small clinic outside the Malmo General Hospital, and the other half not screened--and the two groups have been followed closely ever since. The experiment ran like clockwork. "There was only one breast clinic in all of Malmo--unusual for a city of this size," the lead researcher, Ingvar Andersson, recalled. "All the women were screened at the same clinic year after year, resulting in a highly consistent, controlled study--the most stringent study that could be produced."
In 1988, at the end of its twelfth year, the Malmo study reported its results. Overall, 588 women had been diagnosed with breast cancer in the screened group, and 447 in the control group--underscoring, once again, the capacity of mammography to detect early cancers. But notably, at least at first glance, early detection had not translated into overwhelming numbers of lives saved. One hundred and twenty-nine women had died of breast cancer--sixty-three in the screened and sixty-six in the unscreened--with no statistically discernible difference overall.
But there was a pattern behind the deaths. When the groups were analyzed by age, women above fifty-five years had benefited from screening, with a reduction in breast cancer deaths by 20 percent. In younger women, in contrast, screening with mammography showed no detectable benefit.
This pattern--a clearly discernible benefit for older women, and a barely detectable benefit in younger women--would be confirmed in scores of studies that followed Malmo. In 2002, twenty-six years after the launch of the original Malmo experiment, an exhaustive analysis combining all the Swedish studies was published in the Lancet. In all, 247,000 women had been enrolled in these trials. The pooled analysis vindicated the Malmo results. In aggregate, over the course of fifteen years, mammography had resulted in 20 to 30 percent reductions in breast cancer mortality for women aged fifty-five to seventy. But for women below fifty-five, the benefit was barely discernible.
Mammography, in short, was not going to be the unequivocal "savior" of all women with breast cancer. Its effects, as the statistician Donald Berry describes it, "are indisputable for a certain segment of women--but also indisputably modest in that segment." Berry wrote, "Screening is a lottery. Any winnings are shared by the minority of women. . . . The overwhelming proportion of women experience no benefit and they pay with the time involved and the risks associated with screening. . . . The risk of not having a mammogram until after age 50 is about the same as riding a bicycle for 15 hours without a helmet." If all women across the nation chose to ride helmetless for fifteen hours straight, there would surely be several more deaths than if they had all worn helmets. But for an individual woman who rides her bicycle helmetless to the corner grocery store once a week, the risk is so minor that some would dismiss it outright.
In Malmo, at least, this nuanced message has yet to sink in. Many women from the original mammographic cohort have died (of various causes), but mammography, as one Malmo resident described it, "is somewhat of a religion here." On the windy winter morning that I stood outside the clinic, scores of women--some over fifty-five and some obviously younger--came in religiously for their annual X-rays. The clinic, I suspect, still ran with the same efficiency and diligence that had allowed it, after disastrous attempts in other cities, to rigorously complete one of the most seminal and difficult trials in the history of cancer prevention. Patients streamed in and out effortlessly, almost as if running an afternoon errand. Many of them rode off on their bicycles--oblivious of Berry's warnings--without helmets.
Why did a simple, reproducible, inexpensive, easily learned technique--an X-ray image to detect the shadow of a small tumor in the breast--have to struggle for five decades and through nine trials before any benefit could be ascribed to it?
Part of the answer lies in the complexity of running early-detection trials, which are inherently slippery, contentious, and prone to error. Edinburgh was undone by flawed randomization; the BCDDP by nonrandomization. Shapiro's trial was foiled by a faulty desire to be dispassionate; the Canadian trial by a flawed impulse to be compassionate.
Part of the answer lies also in the old conundrum of over- and underdiagnosis--although with an important twist. A mammogram, it turns out, is not a particularly good tool for detecting early breast cancer. Its false-positive and false-negative rates make it far from an ideal screening test. But the fatal flaw in mammography lies in that these rates are not absolute: they depend on age. For women above fifty-five, the incidence of breast cancer is high enough that even a relatively poor screening tool can detect an early tumor and provide a survival benefit. For women between forty and fifty years, though, the incidence of breast cancer sinks to a point that a "mass" detected on a mammogram, more often than not, turns out to be a false positive. To use a visual analogy: a magnifying lens designed to make small script legible does perfectly well when the font size is ten or even six points. But then it hits a limit. At a certain size font, chances of reading a letter correctly become about the same as reading a letter incorrectly. In women above fifty-five, where the "font size" of breast cancer incidence is large enough, a mammogram performs adequately. But in women between forty and fifty, the mammogram begins to squint at an uncomfortable threshold--exceeding its inherent capacity to become a discriminating test. No matter how intensively we test mammography in this group of women, it will always be a poor screening tool.
But the last part of the answer lies, surely, in how we imagine cancer and screening. We are a visual species. Seeing is believing, and to see cancer in its early, incipient form, we believe, must be the best way to prevent it. As the writer Malcolm Gladwell once described it, "This is a textbook example of how the battle against cancer is supposed to work. Use a powerful camera. Take a detailed picture. Spot the tumor as early as possible. Treat it immediately and aggressively. . . . The danger posed by a tumor is represented visually. Large is bad; small is better."
But powerful as the camera might be, cancer confounds this simple rule. Since metastasis is what kills patients with breast cancer, it is, of course, generally true that the ability to detect and remove premetastatic tumors saves women's lives. But it is also true that just because a tumor is small does not mean that it is premetastatic. Even relatively small tumors barely detectable by mammography can carry genetic programs that make them vastly more likely to metastasize early. Conversely, large tumors may inherently be genetically benign--unlikely to invade and metastasize. Size matters, in other words--but only to a point. The difference in the behavior of tumors is not just a consequence of quantitative growth, but of qualitative growth.
A static picture cannot capture this qualitative growth. Seeing a "small" tumor and extracting it from the body does not guarantee our freedom from cancer--a fact that we still struggle to believe. In the end, a mammogram or a Pap smear is a portrait of cancer in its infancy. Like any portrait, it is drawn in the hopes that it might capture something essential about the subject--its psyche, its inner being, its future, its behavior. "All photographs are accurate," the artist Richard Avedon liked to say, "[but] none of them is the truth."
But if the "truth" of every cancer is imprinted in its behavior, then how might one capture this mysterious quality? How could scientists make that crucial transition between simply visualizing cancer and knowing its malignant potential, its vulnerabilities, its patterns of spread--its future?
By the late 1980s, the entire discipline of cancer prevention appeared to have stalled at this critical juncture. The missing element in the puzzle was a deeper understanding of carcinogenesis--a mechanistic understanding that would explain the means by which normal cells become cancer cells. Chronic infla
mmation with hepatitis B virus and H. pylori initiated the march of carcinogenesis, but by what route? The Ames test proved that mutagenicity was linked to carcinogenicity, but mutations in which genes, and by what mechanism?
And if such mutations were known, could they be used to launch more intelligent efforts to prevent cancer? Instead of running larger trials of mammography, for instance, could one run smarter trials of mammography--by risk-stratifying women (identifying those with predisposing mutations for breast cancer) such that high-risk women received higher levels of surveillance? Would that strategy, coupled with better technology, capture the identity of cancer more accurately than a simple, static portrait?
Cancer therapeutics, too, had seemingly arrived at the same bottleneck. Huggins and Walpole had shown that knowing the inner machinery of the cancer cell could reveal unique vulnerabilities. But the discovery had to come from the bottom up--from the cancer cell to its therapy. "As the decade ended," Bruce Chabner, former director of the NCI's Division of Cancer Treatment, recalled, "it was as if the whole discipline of oncology, both prevention and cure, had bumped up against a fundamental limitation of knowledge. We were trying to combat cancer without understanding the cancer cell, which was like launching rockets without understanding the internal combustion engine."