Siddhartha Mukherjee - The Emperor of All Maladies: A Biography of Cancer
Page 35
For Papanicolaou, too, this brooding, contemplative period was like a personal camera lucida that magnified and reflected old experimental themes onto new ones. A decades-old thought returned to haunt him: if normal cells of the cervix changed morphologically in graded, stepwise fashion over time, might cancer cells also change morphologically in time, in a slow, stepwise dance from normal to malignant? Like Auerbach (whose work was yet to be published), could he identify intermediate stages of cancer--lesions slouching their way toward full transformation?
At a Christmas party in the winter of 1950, challenged by a tipsy young gynecologist in his lab to pinpoint the precise use of the smear, Papanicolaou verbalized a strand of thought that he had been spinning internally for nearly a decade. The thought almost convulsed out of him. The real use of the Pap smear was not to find cancer, but rather to detect its antecedent, its precursor--the portent of cancer.
"It was a revelation," one of his students recalled. "A Pap smear would give a woman a chance to receive preventive care [and] greatly decrease the likelihood of her ever developing cancer." Cervical cancer typically arises in an outer layer of the cervix, then grows in a flaky, superficial whirl before burrowing inward into the surrounding tissues. By sampling asymptomatic women, Papanicolaou speculated that his test, albeit imperfect, might capture the disease at its first stages. He would, in essence, push the diagnostic clock backward--from incurable, invasive cancers to curable, preinvasive malignancies.
In 1952, Papanicolaou convinced the National Cancer Institute to launch the largest clinical trial of secondary prevention in the history of cancer using his smearing technique. Nearly every adult female resident of Shelby County, Tennessee--150,000 women spread across eight hundred square miles--was tested with a Pap smear and followed over time. Smears poured in from hundreds of sites: from one-room doctor's offices dotted among the horse farms of Germantown to large urban community clinics scattered throughout the city of Memphis. Temporary "Pap clinics" were set up in factories and office buildings. Once collected, the samples were funneled into a gigantic microscope facility at the University of Tennessee, where framed photographs of exemplary normal and abnormal smears had been hung on the walls. Technicians read slides day and night, looking up from the microscopes at the pictures. At the peak, nearly a thousand smears were read every day.
As expected, the Shelby team found its fair share of advanced cancerous lesions in the population. In the initial cohort of about 150,000, invasive cervical cancer was found in 555 women. But the real proof of Papanicolaou's principle lay in another discovery: astonishingly, 557 women were found to have preinvasive cancers or even precancerous changes--early-stage, localized lesions curable by relatively simple surgical procedures. Nearly all these women were asymptomatic; had they never been tested, they would never have been suspected of harboring preinvasive lesions. Notably, the average age of diagnosis of women with such preinvasive lesions was about twenty years lower than the average age of women with invasive lesions--once again corroborating the long march of carcinogenesis. The Pap smear had, in effect, pushed the clock of cancer detection forward by nearly two decades, and changed the spectrum of cervical cancer from predominantly incurable to predominantly curable.
A few miles from Papanicolaou's laboratory in New York, the core logic of the Pap smear was being extended to a very different form of cancer. Epidemiologists think about prevention in two forms. In primary prevention, a disease is prevented by attacking its cause--smoking cessation for lung cancer or a vaccine against hepatitis B for liver cancer. In secondary prevention (also called screening), a disease is prevented by screening for its early, presymptomatic stage. The Pap smear was invented as a means of secondary prevention for cervical cancer. But if a microscope could detect a presymptomatic state in scraped-off cervical tissue, then could another means of "seeing" cancer detect an early lesion in another cancer-afflicted organ?
In 1913, a Berlin surgeon named Albert Salomon had certainly tried. A dogged, relentless champion of the mastectomy, Salomon had whisked away nearly three thousand amputated breasts after mastectomies to an X-ray room where he had photographed them after surgery to detect the shadowy outlines of cancer. Salomon had detected stigmata of cancer in his X-rays--microscopic sprinkles of calcium lodged in cancer tissue ("grains of salt," as later radiologists would call them) or thin crustacean fingerlings of malignant cells reminiscent of the root of the word cancer.
The next natural step might have been to image breasts before surgery as a screening method, but Salomon's studies were rudely interrupted. Abruptly purged from his university position by the Nazis in the mid-1930s, Salomon escaped the camps to Amsterdam and vanished underground--and so, too, did his shadowy X-rays of breasts. Mammography, as Salomon called his technique, languished in neglect. It was hardly missed: in a world obsessed with radical surgery, since small or large masses in the breast were treated with precisely the same gargantuan operation, screening for small lesions made little sense.
For nearly two decades, the mammogram thus lurked about in the far peripheries of medicine--in France and England and Uruguay, places where radical surgery held the least influence. But by the mid-1960s, with Halsted's theory teetering uneasily on its pedestal, mammography reentered X-ray clinics in America, championed by pioneering radiographers such as Robert Egan in Houston. Egan, like Papanicolaou, cast himself more as an immaculate craftsman than a scientist--a photographer, really, who was taking photographs of cancer using X-rays, the most penetrating form of light. He tinkered with films, angles, positions, and exposures, until, as one observer put it, "trabeculae as thin as a spider's web" in the breast could be seen in the images.
But could cancer be caught in that "spider's web" of shadows, trapped early enough to prevent its spread? Egan's mammograms could now detect tumors as small as a few millimeters, about the size of a grain of barley. But would screening women to detect such early tumors and extricating the tumors surgically save lives?
Screening trials in cancer are among the most slippery of all clinical trials--notoriously difficult to run, and notoriously susceptible to errors. To understand why, consider the odyssey from the laboratory to the clinic of a screening test for cancer. Suppose a new test has been invented in the laboratory to detect an early, presymptomatic stage of a particular form of cancer, say, the level of a protein secreted by cancer cells into the serum. The first challenge for such a test is technical: its performance in the real world. Epidemiologists think of screening tests as possessing two characteristic performance errors. The first error is overdiagnosis--when an individual tests positive in the test but does not have cancer. Such individuals are called "false positives." Men and women who falsely test positive find themselves trapped in the punitive stigma of cancer, the familiar cycle of anxiety and terror (and the desire to "do something") that precipitates further testing and invasive treatment.
The mirror image of overdiagnosis is underdiagnosis--an error in which a patient truly has cancer but does not test positive for it. Underdiagnosis falsely reassures patients of their freedom from disease. These men and women ("false negatives" in the jargon of epidemiology) enter a different punitive cycle--of despair, shock, and betrayal--once their disease, undetected by the screening test, is eventually uncovered when it becomes symptomatic.
The trouble is that overdiagnosis and underdiagnosis are often intrinsically conjoined, locked perpetually on two ends of a seesaw. Screening tests that strive to limit overdiagnosis--by narrowing the criteria by which patients are classified as positive--often pay the price of increasing underdiagnosis because they miss patients that lie in the gray zone between positive and negative. An example helps to illustrate this trade-off. Suppose--to use Egan's vivid metaphor--a spider is trying to invent a perfect web to capture flies out of the air. Increasing the density of that web, she finds, certainly increases the chances of catching real flies (true positives) but it also increases the chances of capturing junk and debris floating throu
gh the air (false positives). Making the web less dense, in contrast, decreases the chances of catching real prey, but every time something is captured, chances are higher that it is a fly. In cancer, where both overdiagnosis and underdiagnosis come at high costs, finding that exquisite balance is often impossible. We want every cancer test to operate with perfect specificity and sensitivity. But the technologies for screening are not perfect. Screening tests thus routinely fail because they cannot even cross this preliminary hurdle--the rate of over- or underdiagnosis is unacceptably high.
Suppose, however, our new test does survive this crucial bottleneck. The rates of overdiagnosis and underdiagnosis are deemed acceptable, and we unveil the test on a population of eager volunteers. Suppose, moreover, that as the test enters the public domain, doctors immediately begin to detect early, benign-appearing, premalignant lesions--in stark contrast to the aggressive, fast-growing tumors seen before the test. Is the test to be judged a success?
No; merely detecting a small tumor is not sufficient. Cancer demonstrates a spectrum of behavior. Some tumors are inherently benign, genetically determined to never reach the fully malignant state; and some tumors are intrinsically aggressive, and intervention at even an early, presymptomatic stage might make no difference to the prognosis of a patient. To address the inherent behavioral heterogeneity of cancer, the screening test must go further. It must increase survival.
Imagine, now, that we have designed a trial to determine whether our screening test increases survival. Two identical twins, call them Hope and Prudence, live in neighboring houses and are offered the trial. Hope chooses to be screened by the test. Prudence, suspicious of overdiagnosis and underdiagnosis, refuses to be screened.
Unbeknownst to Hope and Prudence, identical forms of cancer develop in both twins at the exact same time--in 1990. Hope's tumor is detected by the screening test in 1995, and she undergoes surgical treatment and chemotherapy. She survives five additional years, then relapses and dies ten years after her original diagnosis, in 2000. Prudence, in contrast, detects her tumor only when she feels a growing lump in her breast in 1999. She, too, has treatment, with some marginal benefit, then relapses and dies at the same moment as Hope in 2000.
At the joint funeral, as the mourners stream by the identical caskets, an argument breaks out among Hope's and Prudence's doctors. Hope's physicians insist that she had a five-year survival: her tumor was detected in 1995 and she died in 2000. Prudence's doctors insist that her survival was one year: Prudence's tumor was detected in 1999 and she died in 2000. Yet both cannot be right: the twins died from the same tumor at the exact same time. The solution to this seeming paradox--called lead-time bias--is immediately obvious. Using survival as an end point for a screening test is flawed because early detection pushes the clock of diagnosis backward. Hope's tumor and Prudence's tumor possess exactly identical biological behavior. But since doctors detected Hope's tumor earlier, it seems, falsely, that she lived longer and that the screening test was beneficial.
So our test must now cross an additional hurdle: it must improve mortality, not survival. The only appropriate way to judge whether Hope's test was truly beneficial is to ask whether Hope lived longer regardless of the time of her diagnosis. Had Hope lived until 2010 (outliving Prudence by a decade), we could have legitimately ascribed a benefit to the test. Since both women died at the exact same moment, we now discover that screening produced no benefit.
A screening test's path to success is thus surprisingly long and narrow. It must avoid the pitfalls of overdiagnosis and underdiagnosis. It must steer past the narrow temptation to use early detection as an end in itself. Then, it must navigate the treacherous straits of bias and selection. "Survival," seductively simple, cannot be its end point. And adequate randomization at each step is critical. Only a test capable of meeting all these criteria--proving mortality benefit in a genuinely randomized setting with an acceptable over- and underdiagnosis rate--can be judged a success. With the odds stacked so steeply, few tests are powerful enough to withstand this level of scrutiny and truly provide benefit in cancer.
In the winter of 1963, three men set out to test whether screening a large cohort of asymptomatic women using mammography would prevent mortality from breast cancer. All three, outcasts from their respective fields, were seeking new ways to study breast cancer. Louis Venet, a surgeon trained in the classical tradition, wanted to capture early cancers as a means to avert the large and disfiguring radical surgeries that had become the norm in the field. Sam Shapiro, a statistician, sought to invent new methods to mount statistical trials. And Philip Strax, a New York internist, had perhaps the most poignant of reasons: he had nursed his wife through the torturous terminal stages of breast cancer in the mid-1950s. Strax's attempt to capture preinvasive lesions using X-rays was a personal crusade to unwind the biological clock that had ultimately taken his wife's life.
Venet, Strax, and Shapiro were sophisticated clinical trialists: right at the onset, they realized that they would need a randomized, prospective trial using mortality as an end point to test mammography. Methodologically speaking, their trial would recapitulate Doll and Hill's famous smoking trial of the 1950s. But how might such a trial be logistically run? The Doll and Hill study had been the fortuitous by-product of the nationalization of health care in Great Britain--its stable cohort produced, in large part, by the National Health Service's "address book" of registered doctors across the United Kingdom. For mammography, in contrast, it was the sweeping wave of privatization in postwar America that provided the opportunity to run the trial. In the summer of 1944, lawmakers in New York unveiled a novel program to provide subscriber-based health insurance to groups of employees in New York. This program, called the Health Insurance Plan (HIP), was the ancestor of the modern HMO.
The HIP filled a great void in insurance. By the mid-1950s, a triad of forces--immigration, World War II, and the Depression--had brought women out of their homes to comprise nearly one-third of the total workforce in New York. These working women sought health insurance, and the HIP, which allowed its enrollees to pool risks and thereby reduce costs, was a natural solution. By the early 1960s, the plan had enrolled more than three hundred thousand subscribers spread across thirty-one medical groups in New York--nearly eighty thousand of them women.
Strax, Shapiro, and Venet were quick to identify the importance of the resource: here was a defined--"captive"--cohort of women spread across New York and its suburbs that could be screened and followed over a prolonged time. The trial was kept deliberately simple: women enrollees in the HIP between the ages of forty and sixty-four were divided into two groups. One group was screened with mammography while the other was left unscreened. The ethical standards for screening trials in the 1960s made the identification of the groups even simpler. The unscreened group--i.e., the one not offered mammography--was not even required to give consent; it could just be enrolled passively in the trial and followed over time.
The trial, launched in December 1963, was instantly a logistic nightmare. Mammography was cumbersome: a machine the size of a full-grown bull; photographic plates like small windowpanes; the slosh and froth of toxic chemicals in a darkroom. The technique was best performed in dedicated X-ray clinics, but unable to convince women to travel to these clinics (many of them located uptown), Strax and Venet eventually outfitted a mobile van with an X-ray machine and parked it in midtown Manhattan, alongside the ice-cream trucks and sandwich vendors, to recruit women into the study during lunch breaks.*
Strax began an obsessive campaign of recruitment. When a subject refused to join the study, he would call, write, and call her again to persuade her to join. The clinics were honed to a machinelike precision to allow thousands of women to be screened in a day:
"Interview . . . 5 stations X 12 women per hour = 60 women. . . . Undress-Dress cubicles: 16 cubicles X 6 women per hour = 96 women per hour. Each cubicle provides one square of floor space for dress-undress and contains four clothes lockers for
a total of 64. At the close of the 'circle,' the woman enters the same cubicle to obtain her clothes and dress. . . . To expedite turnover, the amenities of chairs and mirrors are omitted."
Curtains rose and fell. Closets opened and closed. Chairless and mirrorless rooms let women in and out. The merry-go-round ran through the day and late into the evening. In an astonishing span of six years, the trio completed a screening that would ordinarily have taken two decades to complete.
If a tumor was detected by mammography, the woman was treated according to the conventional intervention available at the time--surgery, typically a radical mastectomy, to remove the mass (or surgery followed by radiation). Once the cycle of screening and intervention had been completed, Strax, Venet, and Shapiro could watch the experiment unfold over time by measuring breast cancer mortality in the screened versus unscreened groups.
In 1971, eight years after the study had been launched, Strax, Venet, and Shapiro revealed the initial findings of the HIP trial. At first glance, it seemed like a resounding vindication of screening. Sixty-two thousand women had been enrolled in the trial; about half had been screened by mammography. There had been thirty-one deaths in the mammography-screened group and fifty-two deaths in the control group. The absolute number of lives saved was admittedly modest, but the fractional reduction in mortality from screening--almost 40 percent--was remarkable. Strax was ecstatic: "The radiologist," he wrote, "has become a potential savior of women--and their breasts."
The positive results of the HIP trial had an explosive effect on mammography. "Within 5 years, mammography has moved from the realm of a discarded procedure to the threshold of widespread application," a radiologist wrote. At the National Cancer Institute, enthusiasm for screening rose swiftly to a crescendo. Arthur Holleb, the American Cancer Society's chief medical officer, was quick to note the parallel to the Pap smear. "The time has come," Holleb announced in 1971, "for the . . . Society to mount a massive program on mammography just as we did with the Pap test. . . . No longer can we ask the people of this country to tolerate a loss of life from breast cancer each year equal to the loss of life in the past ten years in Viet Nam. The time has come for greater national effort. I firmly believe that time is now."