The Book of Woe: The DSM and the Unmaking of Psychiatry

Page 34

by Gary Greenberg

The session was just a warm-up, an opportunity for Kupfer and Regier to reiterate the shortcomings of the DSM-IV, to talk about paradigm shifts without quite saying whether or not the new book would constitute one, to once again list all the effort, time, and money that had gone into the revision. As he had in Hawaii, Lawson Wulsin, the APA’s liaison to other medical specialties, gave his advice on psychiatry’s ongoing struggle for respectability and money. “Mental illness has been promoted to constituting a respectable public health problem,” he said. This meant that psychiatrists now had a huge opportunity, but only if they could “learn how to work outside their comfort zone, and how to get well paid for it.” The key to this, he continued, was joining other medical specialties in “integrated care settings,” where they could deliver “measurement-based care.” The DSM-5, with its focus on dimensional measures, would be one of the tools psychiatrists could use to “do well and win at that game.”

The afternoon symposium was the main event—the public announcement of the results of the field trials. Before the packed room could hear the numbers, however, Helena Kraemer once again described her methodology—and the problems with the DSM-III and DSM-IV that she had attempted to remedy. Since she’d presented it the previous year in Hawaii, her critique had become a full-on broadside against Spitzer and Frances. Their sample sizes were too small. They had created conditions that were too pristine. They had invited conflicts of interest, letting work groups design the field trials, allowing clinicians to choose subjects most likely to qualify for the diagnosis in question, taking too much of a hand in adapting the kappa statistic to the DSM-III, focusing too much on maintaining prevalence rates in DSM-IV. “There was a bias in those studies,” she said. Spitzer and Frances had feathered their own nests, but no matter how good the results made their DSMs look, they were “badly inflated and that causes a problem now.”

The resulting problem—unreasonable expectations of how reliable psychiatric diagnosis should (or could) be—had a solution. It was the downward adjustment she had first suggested at the previous annual meeting, the one where she announced that a kappa of .2 would be “acceptable.”

“I regret that now,” she said, but not because such a low number was unacceptable. What she actually meant was that a result of .2 “might be accepted.” A very low kappa, she allowed, was a “point of worry,” but by no means should it be automatically discarded. In fact, an overall reduction in reliability numbers would signify not that the criteria (or the tests) were faulty, but that the APA’s researchers had succeeded in duplicating the messiness of the real world.

By that standard, the field trials were an unalloyed triumph. It was left to Regier to announce the results of the studies conducted at academic medical centers. He led with schizophrenia, which had achieved a kappa of .81 in DSM-III and .76 in DSM-IV, but in the DSM-5 came in at .46. Alcohol Use Disorder scored a .40, compared with .81 in DSM-III. Agreement on Oppositional-Defiant Disorder was .41, much lower than the .66 found in DSM-III and the .55 in DSM-IV. Some of the new disorders received relatively solid ratings—Hoarding Disorder notched a .59, Binge Eating Disorder a .56, and Disruptive Mood Dysregulation Disorder a .50. But others were dismal, like Mixed Anxiety-Depression, whose kappa of less than .01 was deemed “uninterpretable,” as were kappas for four other disorders, including the once stalwart Obsessive-Compulsive Disorder. Results from the personality disorders trials were also confounding; Toronto’s Centre for Addiction and Mental Health, one of Canada’s leading psychiatric hospitals, managed a .75 reliability in identifying Borderline Personality Disorder, while clinicians at the august Menninger Clinic in Houston scored only a .34 using the same criteria. Some reworked disorders did fare better than did their predecessors—PTSD, at .67, was eight points higher than the DSM-IV trial and beat DSM-III by twelve; ADHD was also a few points better than it had been, although Regier had to acknowledge that “we’re still going back and forth” on whether there would be eighteen or twenty-two criteria; the autism spectrum rang up a solid .69, although that was much lower than the .85 of the DSM-IV trial for autism.

But these high-ish numbers were the exceptions, and if the audience members had managed to recalibrate their expectations, perhaps by reminding themselves that in golf lower is also better, they were not able to suppress a murmur when Regier announced that the kappa for Major Depressive Disorder—whose criteria were, other than the removal of the bereavement exclusion, unchanged—was .32 and that Generalized Anxiety Disorder had scored a paltry .20.

Something had gone terribly wrong. Those two diagnoses were the Dodge Dart and Ford Falcon of the DSM, simple and reliable and ubiquitous, and if clinicians were unable to agree on who warranted them, there were only a few possible conclusions: that the DSM-III and DSM-IV had been unreliable from the beginning, that the DSM-5 was unreliable, or that the field trials were so deeply flawed that it would be impossible to say with any kind of certainty just how reliable the new book would be.

Darrel Regier is not a demonstrative man. But even so, he seemed strangely cool, as if he had pumped himself full of Valium before announcing results that were not merely bad but disastrous. Hadn’t he promised all along that the field trials would bear out the revisions and staked his (and the APA’s) reputation, as well as the fate of the DSM-5, on the results? And didn’t the lower kappas and the discrepancies among sites signal a return to the dark days before DSM-III, when diagnoses depended more on where they were rendered and by whom than on what was wrong with the patient? Here he was, announcing a miserable failure, but if he grasped the extent of the debacle, nothing about his delivery showed it.

That might be because he had an explanation, one that seemed to satisfy him. “It’s important to go back and look at where we were and where we’ve come,” he told us. “We’re in a different era of statistical sophistication now.” Unlike Spitzer and Frances, “we gave [clinicians] a set of options and they had to choose,” he explained. In that unsophisticated era, clinicians “didn’t have other diagnoses to confuse them,” which is why they got such high kappas. But the DSM-5’s “state-of-the-art design” had ensured that they would be confused, and the dismal numbers were the proof of the DSM-5’s validity.

The problem, in other words, was not in the numbers but in ourselves. We’d swallowed what Spitzer and Frances had dished out; their comfort food had fattened our expectations, and if the new numbers challenged our unschooled palates and proved a little hard to digest, they at least represented the way psychiatric diagnosis works in the real world. We were just too unsophisticated to understand that failure is success.

• • •

One failure couldn’t be gussied up, no matter how hard Eve Moscicki, head researcher for the APA’s Practice Research Network, tried. And try she did, as she presented the results of the Routine Clinical Practice trial, the one in which I had participated. She tried the Kraemer gambit, lowering the bar at the outset by explaining that “this is a first-time presentation” that would offer only “a flavor of the results.” She tried Regierian obfuscation, telling us only how many patients had been enrolled, but not how many of the five thousand clinicians who signed up had actually completed the study. (“I don’t have the exact numbers off the top of my head,” she said during the Q&A, but she finally had to acknowledge that only 640 had submitted data on at least one patient.) She tried distraction, blaming the failure on bureaucratic delays and the unexpectedly long software training rather than on the study’s design, its imposition of a near-impossible burden of conducting hours-long interviews using unfamiliar instruments whose clinical value was questionable and whose reimbursement value was zero. She tried the corporate mission-statement approach—reframing the “unique challenges” faced by the APA as “opportunities for innovative resolutions.” She even went Hollywood, calling her talk “Trials, Tribulations, and Triumphs,” as if it were an elevator pitch for a movie about a plucky heroine overcoming adversity.

/> If all her bobbing and weaving hadn’t tipped us off to the extent of the fiasco, it became obvious about fifteen minutes into her talk when, after one last reminder that her study was about the feasibility and usefulness of the revisions and not their reliability, she finally flashed some data on the screen—a bar graph depicting how easy (or hard) clinicians found the new criteria to use.

“For ADHD, the majority of clinicians thought it was very easy or extremely easy,” she said. The same was true, she went on, for autistic disorders, anxiety, and depression. This might have been a bright moment in an otherwise bleak afternoon but for one thing: according to the graph, while the narrowest majority (52 percent) had indeed given a thumbs-up to ADHD and anxiety disorders, the number who thought the autism and depression criteria and measures were very easy or extremely easy to use was below 50 percent. Moscicki didn’t seem to notice this discrepancy between the story she was telling us and the data she was showing us. Perhaps she thought that since she was presenting only a flavor, she was free to add sweeteners to taste, or maybe she just didn’t care what we thought, or figured that no one would point out the discrepancies, no matter how obvious, for the same reason that people are reluctant to mention that a coworker smells bad or has left his fly unzipped: because you really don’t want to embarrass him.

And up to a point, she was right.

Moscicki switched from “ease of use” to “usefulness.” She put up the slide about ADHD and autism diagnoses.

“It looks to me like . . . I want . . .” She trailed off and peered at the slide, which showed even more anemic results than the earlier one. It was as if she had never seen it before, although she may only have been calculating the odds of getting away with this forever. “It looks to me like almost a majority for ADHD thought the criteria were pretty useful, and for autism, clearly the majority thought the criteria . . .”

A man’s voice rang out in the darkened room. “It’s not a majority,” he said. “Look, thirty-seven plus seven”—the “very” and “extremely useful” numbers—“doesn’t equal fifty.”

The interrupter, who turned out to be a blogger for Scientific American, didn’t bother asking exactly what “pretty useful” was supposed to mean. He didn’t ask Moscicki if she thought it was kosher to make up a diagnostic entity called trauma, which she acknowledged she had teased out of the anxiety disorders and which looked suspiciously like a category she had cooked up so she could parade its 62 percent favorable rating. He didn’t point out the lunacy of spending all that time (including mine) and money to find out not whether the criteria or the cross-cutting measures were reliable or valid, but rather only whether clinicians liked the DSM-5, as if the APA were looking for Facebook friends. He didn’t raise the question of selection bias, that is, whether or not the same factors that motivated the few volunteers who actually followed through also predisposed them to give the DSM a Like. He didn’t have to do any of this. Nor did he have to deconstruct propaganda or slog through weedy statistics. He just did the simple math and came to the obvious conclusion.

“This is totally appalling,” he said.

“It’s okay, it’s okay,” Moscicki replied. It was not clear whom she meant to comfort. “This is a first look. If it’s not a majority, it’s a large number of them.”

But her antagonist wasn’t buying it.

“This is deceptive,” he said, as he slung his backpack over his shoulder, spun on his heel, and stormed out.

Like the kid in the story about the emperor’s clothes, he had managed to say out loud what everyone in the room, or at least those who could add, must have been thinking: that Moscicki had crossed the Frankfurt Line, the one between bullshit and lies.

• • •

The conference featured at least one glimmer of good news for nosologists. Regier mentioned it a couple of times in his various talks, but the honor of revealing it went to Charles O’Brien, a University of Pennsylvania psychiatrist and head of the DSM-5 work group for substance-related disorders.

Before O’Brien got down to the business at hand—his committee’s proposals—he turned to the business of business. “People should understand that when they read things in the newspaper about Pharma influence, I don’t believe it,” he said, as he made the conflict-of-interest disclosure required of every speaker. “We stopped that a long time ago, even though in the past we might have had some consultancies.” O’Brien didn’t say exactly what they had stopped, but it clearly wasn’t the consultancies. Indeed, he was still working for three drug companies. “Only two of them are actually producing drugs that you can prescribe or buy,” he explained, and this work “is really socially important, because there are very few medications available and not many companies are working on this.” The public fails to understand this, and psychiatry (or at least psychiatrists’ income) is the victim of its ignorance.

This was a riff that could not help but ingratiate O’Brien with his audience. And he needed all the help he could get. He had to explain to his colleagues, many of them skeptical, why his group had eliminated the categories of substance abuse and substance dependence, which the DSM-IV had used to sort out the people who merely get in trouble with drugs from those who get addicted to them. In their place, the committee proposed the supercategory of substance use disorders, which, it said, occurred whenever there was a “problematic pattern” of substance use that led to “clinically significant impairment and distress.” O’Brien’s group, perhaps remembering that no one had yet defined clinical significance, had listed eleven further criteria. If an impaired patient met two of them, he or she had the disorder. So, for instance, if in a twelve-month period you “often” drink “larger amounts or over a longer period than intended,” and experience “craving or strong desire or urge to use alcohol,” you qualified for Alcohol Use Disorder. The few studies that had been done using the new diagnoses indicated that many people looked forward intensely to their next party and, when they got there, took that third martini or extra toke—enough, in fact, to cause some Australian researchers to forecast a DSM-5-related 60 percent increase1 in the prevalence of drug-related diagnoses.

O’Brien thought these warnings were balderdash, but he also thought the DSM-IV was balderdash. “I feel free to criticize DSM-IV,” he said, because he’d been part of that revision, which he now characterized as “a bunch of wise men sitting around a table and asking what happens when people start using drugs.”

“Although we thought we were wise, we were wrong,” he said. “There is no evidence to support this idea of drug abuse.”

Not that people don’t use drugs to their own or others’ detriment. But the problem isn’t that sometimes the use causes collateral damage (abuse) or becomes habitual (dependence). The problem is “compulsive, out-of-control drug seeking.” O’Brien would have preferred to call the reformulated disorder addiction, but “some people have a kind of allergy to the word,” believing that it carries too much stigma. Avoiding the a-word is “useless,” O’Brien said. “When you have the president talking about addiction to oil, the word has lost its pejorative tone,” and besides, even if the president did mean it pejoratively, addiction is “what the average doctor is going to call it.” But the chair was evidently outvoted, and the anodyne new name won the day.

Whatever its name, O’Brien had no doubt about the nature of the problem. “Addiction is a brain disease,” he said. Of course, this was the tacit assumption of the DSM, not to mention of psychiatric nosology for the last hundred years: that what psychiatrists were treating were illnesses that originated in the brain, and that someday they would find out exactly where and how. That promise, O’Brien reminded the crowd, had gone unfulfilled. “Let’s take depression or anger or any of the other things we diagnose,” he said. “They’re all subjective. You have to get hints from what the patient says and how they say it, but you have no test for it.”

On the other hand, “we do have t
ests for craving,” he said. “I think craving could become the first biomarker in psychiatry. I can show you where it is in the brain.” And so he did, flashing a photo on the screen. “If you’re an addict,” he said, “you’re noticing this person is booting right now.” Actually, you didn’t have to be an addict; the picture featured a tied-off arm, a blood-filled syringe stuck into a tracked-up vein. What would be different if you were an addict—at least it would if you had just been given a shot of carbon-11 raclopride, a radioactive marker, and a PET scanner had just detected its emissions—is the way your brain would light up upon beholding this image. “This is the caudate, this is the putamen,” O’Brien said, pointing to the next slide, a chart of an addict’s neural activity. “There’s a complete correlation here between the subjective feeling of craving and the degree of inhibition of the binding of raclopride.” Similarly, said O’Brien, show an alcoholic an image related to drinking, and you will notice “increased blood flow to the cingulate gyrus, the anterior cingulate, the insula, and the nucleus accumbens down here.” In both cases, you’re seeing disturbances in dopamine metabolism, “the reward system,” as O’Brien put it. You’re seeing addiction—not the experience, which can only be described in words and assayed subjectively, but the thing itself—caught when it thought no one was looking, naked and unmistakable.

Of course, there was a catch. “The clinician would have to have a brain-imaging machine,” said O’Brien. “But these are getting to be very common,” he added. He didn’t have to explain to this crowd what that really meant: that devices like brain scanners could be huge profit centers, a way to go outside their comfort zone and get well paid for it, as Lawson Wulsin would put it, to win at the game by delivering the measurement-based care that insurers crave.

‹ Prev Next ›