by Lizzie Stark
While most body tissues—everything from mouth to rectum—can develop cancer, doctors only regularly screen healthy people for a few, among them breast, cervical, colon, rectal, and lung cancer. Currently scientists are studying screening for other cancers, including prostate and skin cancer, according to the National Cancer Institute. We don’t screen people for every cancer, just for the common ones that are easy to detect. Just as it doesn’t make sense to screen say, both men and women for prostate cancer, because the number-one risk factor for prostate cancer is being a man, it also doesn’t make sense to screen everyone in the general population for comparatively rare cancers like kidney cancer, which is hard to detect anyway. On top of this, screening must do more than simply catch cancer—the goal of screening is to reduce cancer deaths and elongate life span by catching cancer early.
But “early” is a relative term when it comes to cancer. Consider two women undergoing screening for breast cancer who have clean mammograms performed on the same day. A year later, at their return appointments, a mammogram discovers a small, slow-growing tumor in one woman’s breast. Her cancer is probably curable through traditional therapy. The other woman also has a small tumor in her breast, but the cancer is aggressive and has already hitched a ride through the lymphatic system to her liver and lungs. She has only a few years to live. Although the time between screenings—one year—is identical, the mammogram has caught the first cancer “early” in the development of the tumor and the second one “late” in the development of the cancer. Catching cancer early means catching it when it’s in a less lethal stage of the disease, and that depends, of course, on what particular strain of cancer you have. One of the arguments against mammography as a screening tool is that it’s good at catching slow-moving cancer that isn’t likely to kill you but pretty bad at catching the aggressive stuff that grows fast.
Detection tests for cancer walk a thin line. Ideally, such tests would catch all cases of cancer—they wouldn’t return positive results for things that looked like cancer but weren’t (false positives leading to overdiagnosis), but they also wouldn’t miss any cases of cancer (false negatives leading to underdiagnosis). And while we’re wishing for an ideal world, perhaps we could have one test that found all cancers and then a second test to tell you which cancers were lethal and which ones weren’t, Brawley suggests. A perfect detection test would be, in the words of Goldilocks, “just right.” But since we don’t live in a fairy tale, in the real world, over-and underdiagnosis are balanced against one another. You can think of cancer screening like a net strung across a river, fishing for salmon. If I weave it tight enough to catch all the salmon that come downstream, chances are good it’ll also catch a bunch of not-salmon—turtles, otter, river weed, stuff that I’m not interested in. On the other hand, if I weave the net loosely enough to let the otter and turtles pass, some of the smaller salmon will swim through—I won’t catch them all. Use a very sensitive test, and you’ll get lots of false positives. Use a not-sensitive test, and you’ll get false negatives. It’s pretty hard to win the game.
Further complicating the picture is that it’s not enough for a screening test to merely catch cancer—it must also increase life span, and that’s tricky to study because of lead-time bias. Siddhartha Mukherjee explains the complications with a thought experiment. Imagine two identical twin women with identical forms of cancer that develop at exactly the same time. The first twin is all about prevention and screening, and after a routine mammogram, doctors discover her cancer in 1995. They treat her with surgery and chemotherapy, but in 2000 she relapses and dies. Her sister dislikes and distrusts cancer surveillance, so she doesn’t go to the doctor until 1999, when she discovers a lump in her breast while showering. The doctors give her some palliative treatment, but she dies in 2000 at the same moment as her sister. Did surveillance help? Well, the first twin survived for five years after diagnosis and the second twin survived only one—doctors might say that the first woman’s surveillance helped her live longer after diagnosis. Yet the two women had the exact same sort of cancer and the same life span. The surveillance simply pushed the diagnosis date backward, a phenomenon scientists call “lead-time bias.” A good detection test, therefore, does not merely make patients survive for longer; it increases overall life span. If the first twin lived longer than her nonscreening sister, maybe then the regular mammograms were worth it. As one can well imagine, it’s hugely complicated to produce a good screening tool that balances over-and underdiagnosis and increases life span.
Surveillance rests on the idea that there is a long march from normal cells to atypical cells to cancerous cells and finally to tumors and metastasis. The reasoning goes that if it’s found when it’s baby cancer—cute and cuddly—but before it becomes Voltron, it’s easier to kill. This idea of carcinogenesis, that cancer starts as normal cells which slowly become atypical and then malignant, suggests that catching cancer earlier in its development, when it is not cancer but “precancer,” can help prevent cancer death. But not all cancer is suited to these methods. It’s a bit like cooking onions—if you put them over high heat, you have to watch them carefully to make sure they don’t burn. But if you put them on low, then they take a longer time to go from raw to burnt, and you don’t have to be so attentive. Let’s say I’m predestined to get cancer. If my cancer is aggressive, the cells in my body might change from normal to atypical to cancerous very quickly, which wouldn’t give researchers much time to disrupt this process. If my cancer grows slowly, surveillance is more likely to help, because there’s a longer time window in which treatment can stamp it out. Of course, no one knows quite how high their flame is—how fast their potential cancer might grow—in advance.
This is actually one of the cool things about cervical cancer—it tends to grow slowly, over a period of years, beginning on the outside of the cervix and spiraling inward. This long cooking time means medicine has plenty of time to catch it before it becomes lethal. It used to be the leading cause of cancer death in women, but now it’s not even in the top ten, thanks to the Pap smear screening test. And we owe the Pap smear to its namesake George Papanicolaou and, oddly, to the fact that guinea pigs don’t visibly menstruate.
Greek scientist George Papanicolaou arrived penniless in New York in 1913. Despite his training in medicine and zoology, at first he struggled to get a job, so he worked, I kid you not, as a carpet salesman until he landed a position at Cornell University studying guinea pig menstruation. Guinea pigs had a mysterious cycle, as they didn’t bleed or shed much tissue during their monthlies. So Papanicolaou used a nasal speculum to hold them open while he scraped cells off their cervixes and then scrutinized the resulting slides under a microscope; he discovered that the cells waxed and waned in size and shape depending on when in the cycle he’d scraped them. Maybe what was true in guinea pigs was true in humans. By the end of the 1920s, Papanicolaou was studying humans, and as Mukherjee puts it, “His wife Maria, in surely one of the more grisly displays of conjugal fortitude, reportedly allowed herself to be tested by cervical smears every day.” As it turned out, human cervical cells also change depending on the stages of a woman’s menstrual cycle.
This was interesting but hardly revolutionary. I mean, sure, it’s fascinating to know that cervical cells change shape during the menstrual cycle, but no woman needs a pelvic exam and a microscope to tell her Aunt Flo is in town. So rather than focusing on healthy women, Papanicolaou began reading slides from women with many different gynecological problems, from infections to tubal pregnancies and fibroids. In 1928 at a eugenics conference, he announced his shattering discovery. He’d found that smears with abnormal cells in them came from women with abnormal cervixes, and he’d been able to diagnose cervical cancer from such a smear. Over the next decades he refined the procedure, and by 1950 he hit on the idea that its true use wasn’t diagnosing cervical cancer, but catching it early, since cervical cancer begins in an exterior lesion before tunneling inward.
As it turn
s out, the Pap smear is pretty good at catching cancer when it is a fluffy bunny. A 1952 study of 150,000 women in Shelby County, Tennessee, found 555 cases of invasive cervical cancer and 557 cases of preinvasive cancer or precancerous changes. The clincher was that the women with precancer were about twenty years younger than the ones with cancer. The Pap smear meant doctors could detect cancer about twenty years earlier than they’d previously been able to, and it changed cervical cancer from mostly lethal to mostly curable.
While the cure rates were wonderful, the Pap smear also opened the door to uncertainty. It separated the job of the physician—taking the smear—from the job of the pathologist who read it. And as the American Cancer Society promoted widespread screening in 1945, a new problem presented itself—pathologists didn’t want to search a haystack of slides for the needle of cancer. A new class of worker—the cytotechnician—was born. Poorly paid and mostly female, cytotechnicians screened slides and brought the hinky ones to pathologists for a second look. The increase in manpower, er, womanpower made widespread screening possible. It also relied on human judgment, which meant it was susceptible to human error. A 1956 experiment that sent twenty smears of atypical cells to twenty-five pathologists revealed how deep human error could run. Three pathologists found no cancer, three found one case of cancer, four found nine cases of cancer, two found twelve cases, and two found thirteen cases. In addition, abnormal pap smears open the door to further surgeries—cervical biopsies to confirm the presence of cancer, or cryosurgery or laser therapy to burn or freeze off any atypical areas. These days, hysterectomy is only occasionally recommended, but back in the 1960s, the common practice in the United States was to treat cervical lesions—or any other lady-organ complaint—with the surgery. As Ilana Löwy, a senior researcher at the French National Institute of Health and Medical Research, wrote, although the Pap smear did reduce deaths from cervical cancer, “the low cost of elimination of cervical lesions is not, however, a zero cost,” especially when it comes to precancerous diagnoses. In addition to rare but potentially serious complications, “one of the rarely discussed drawbacks of such screening is an irreversible generation of uncertainty…. Diagnosed with a potentially threatening condition with an uncertain meaning, they [the patients] are not sure if they should see themselves as sick or healthy.”
As a woman positive for a BRCA mutation, I bear this uncertainty doubly, both because I am frequently screened for cancer and am therefore more likely to receive ambiguous results, but also because the BRCA test itself is a sort of screening for precancer. I may not have any precancerous lesions inside me, but I have been told that I have a potentially life-threatening mutation inside every cell of my body. After my genetic results came back, I no longer felt like the physically healthy twenty-seven-year-old newlywed that I was. Instead I became someone who went to the doctor more than ten times a year, like a good patient, to make sure I wasn’t sick yet. I lived in a state of betweenness, in a no-man’s-land straddling the worlds of sick and healthy.
The Pap smear’s success as a cancer screening tool influenced the later push for mammography, because it proved screening could prevent cancer death and that American women were cool with letting doctors nose around their most intimate parts on a yearly basis in order to do so. It also vindicated the idea that healthy cells change slowly into cancerous ones and that disrupting this progression could prevent cancer. However, the later push for mammography also ignored a bunch of stuff that made cervical cancer a good candidate for screening. Doctors can look directly at the opening of your cervix with their naked eye and a speculum—it doesn’t require X-rays or anything—which means screening is easy and not too expensive. Most cervical cancer also moves glacially, like a turtle with a dagger strapped to its head—which means it’s easier to catch early. Nonetheless, the same logic behind the Pap smear would be extended, in the 1960s and 1970s, to mammography, and beyond, to “every cancer we had a test on,” Brawley tells me.
Berlin surgeon Albert Salomon was one of the first people to take photos of breasts—three thousand breasts amputated in mastectomies, that is—using X-rays in 1913. Sure enough, he’d been able to see cancer and mineral deposits. Unfortunately, his studies couldn’t continue because in the mid-1930s the Nazis removed him from his position. He escaped to Amsterdam, leaving behind his technology, which he called mammography. The method didn’t resurface in America until the 1950s and 1960s, when the desire for breast cancer screening converged with radiologists who had been honing their craft since the turn of the century.
As with all screening tests, mammography was a nightmare to evaluate, and there are only a handful of large-scale studies that have investigated its efficacy. When it comes to scientific evidence, there’s a hierarchy, Brawley says. At the top of the hierarchy are prospective randomized trials—big trials that have two arms, one with patients receiving whatever treatment is being tested, and one with patients who aren’t, or who are receiving the current standard of care. So to figure out whether one regimen of new chemo drugs is better than the standard treatment, for example, patients in one arm of the trial receive the new treatment, and those in the other arm receive the current cocktail. In order for the study to be good, the patients should be randomly assigned to one arm or the other, and the rest of their treatment should be identical. Randomized trials are the gold standard of evidence. Next, there are large cohort studies, which follow large numbers of people for long periods of time and examine risk factors. For example, a large cohort study might look at dads who exercise and dads who don’t, to see if exercise has any bearing on developing prostate cancer. Below that is a case control study, which compares groups of people with different outcomes—say people with lung cancer and people without—and tries to uncover the differences that might have caused their cancer or lack thereof. And last in line is the opinion of an expert.
There have been prospective randomized trials of mammography—the gold standard—and all eight of them showed that screening saves lives, according to Brawley. Unfortunately, “because you’re dealing with human beings and because you’re dealing with large numbers of human beings, no study is ever going to be as precise as we would like it to be,” he tells me. Most of the studies have potentially large flaws, according to Mukherjee. A 1963 study of eighty thousand New York women aged forty to sixty showed that mammography reduced breast cancer mortality by 40 percent, but he writes, it was later criticized because the control group hadn’t been told it was part of a trial, which had skewed the numbers. As Brawley tells me, the number of women enrolled in the trial changed over time, making it hard to measure the prospective benefit. Unfortunately for future studies, the New York results prompted the American Cancer Society to launch its massive Breast Cancer Detection Demonstration Project (BCDDP), which enrolled 250,000 women nationwide in 1971. With so many women already participating in this project, it was hard to find enough uninvolved women to run a good study in the United States. So scientists tried in five different Swedish cities, Scotland, and Canada. Pretty much all of them had problems, according to Mukherjee.
The Canadian trial in the early 1980s suffered because nurses randomized patients after taking down their medical histories and steered a disproportionate number of high-risk women and women with abnormal exams into the mammography group, muddling the results. Mukherjee called the Edinburgh study “a disaster” because “doctors assigned blocks of women to the screening or control groups based on arbitrary criteria. Or, worse still, women assigned themselves. Randomization protocols were disrupted. Women often switched between one group and the other as the trial proceeded, paralyzing and confounding any meaningful interpretation of the study as a whole.” Only the Malmö trial of the late 1970s bore any fruit. It studied forty-two thousand women, screening half, and by 1988 the results were in. Mammography had benefited women over fifty-five, reducing breast cancer deaths by 20 percent. Young women had not benefited. As Mukherjee puts it, “This pattern—a clearly discernible benefit
for older women, and a barely detectable benefit in younger women—would be confirmed in the scores of studies that followed Malmö.” Of course, years later, a paper in the Journal of the National Cancer Institute suggested that the Malmö numbers were compromised because breast cancer rates among Swedish women may have been spontaneously declining prior to the introduction of the mammogram.
Screening women in their forties for cancer via mammography is still controversial. As Brawley tells me, cancer shows up white on mammograms, and many women in their forties still have dense breasts, which also show up white. Looking for a white tumor on a white background is much harder than looking for a white tumor in a less dense, older breast that shows up grey or black on the scan. On top of this, cancer risk goes up as you age—there aren’t so many women in their forties who develop breast cancer, so you have to sift through more of them to see a benefit. Brawley tells me that the data show that you’d have to scan 1,904 women in their forties to save one life. And in saving that one life, numerous other women would undergo callbacks and unnecessary biopsies. Mammography saves lives, he says, but for women in their forties it takes a lot of work to save one life, and in the meantime, the data show that some women will have such a bad experience with the scares of screening that they will swear off mammography for the rest of their lives—including in their fifties and sixties when it is much more effective. These ambiguities around risk and benefit make the wisdom of screening women at age forty unclear.
What Brawley says about screenings scaring some women really resonates for me. If BRCA women are told to scan their breasts even more, I wonder whether that makes us—a high-risk group that may be more frightened of cancer after potentially watching family members die—even more likely to avoid the emotional ordeal of screening as well. The numbers show that of the BRCA women who decide on surveillance, most of them are vigilant for the first few years, but by five years later the vast majority of them no longer undergo the recommended regimen.