“Well, that’s a very complex question,” Narrow replies, and proceeds not to answer it, except to say that they had tried to figure it out in a pilot study and it hadn’t worked out.
But First asks the question again. Narrow acknowledges that not only are they not requiring a structured interview, they are not even training the study clinicians on the new diagnoses or “telling them how they should be interpreting these measures.” They are simply asking them to familiarize themselves with the website. But, he assures us, this is not a weakness in the design but a strength: the field trials will mirror how clinicians practice in the real world and thus yield more realistic results than the DSM-III and DSM-IV field trials did. Those old numbers, the ones that did so much to restore psychiatry’s respectability, Narrow is saying, were overstated, inflated by the pristine conditions under which they were conducted. But now that the APA has cleverly dirtied up the trials, Narrow tells us, he can almost guarantee that reliability will be worse than it was in earlier DSMs.
I have underestimated First. He has managed to unearth the point of the exercise after all: to prepare us all for the lousy outcomes the field trials were evidently designed to yield. As we file out of the room, I ask him what it’s like to be a bystander to the proceedings. “Oh, it’s absolutely excruciating,” he answers, as if that were obvious. Which, come to think of it, it is.
• • •
Narrow was correct about at least one thing. As Helena Kraemer, the chief statistician on the DSM-5 task force, told a much larger crowd the next day, “People’s expectations of what reliability should be11 have been grossly inflated.” She left no question about who was responsible for this: Bob Spitzer.
Spitzer knew it was not enough to ask two doctors to diagnose a patient, compare their answers, and use the results to pronounce judgment on whether the diagnosis was reliable. That approach wouldn’t account for the possibility that the clinicians agreed by chance—by, say, flipping a coin or tossing a dart or just plain guessing—rather than because the diagnostic criteria were well written. Fortunately for Spitzer, in 1960 a statistician named Jacob Cohen had invented a method for calculating the extent to which agreement between two people using the same rating scale is the result of factors other than chance. The statistic had come to be known as Cohen’s kappa, and Spitzer, working with Cohen, had adapted it for use in evaluating the reliability of diagnoses.
Spitzer and Cohen introduced kappa to psychiatrists in 1967, promoting it as a way out of the reliability mess. At first, they used it primarily to quantify just how bad things were, and this agenda shaped the way they addressed a problem built into the statistic. A kappa of 0 indicates that any agreement is by chance alone; a kappa of 1 indicates that researchers have come to the same conclusion for nonrandom reasons (presumably because the criteria work). But what do the numbers in between mean? How much agreement is sufficient to call a diagnosis reliable (or not)? After all, even a low kappa means that clinicians outperformed coin tossers or monkeys at typewriters.
This has turned out to be a hotly contested question, or at least as hot as anything in statistics gets. In 1974, Spitzer proposed an answer. Kappas of around .40, he said, indicated “poor” agreement, .55 was “no better than fair,” .70 was “only satisfactory,” and more than .80 would be “uniformly high.” But as California professors Stuart Kirk and Herb Kutchins noted, Spitzer “could have employed12 very good, good, not so good, and bad,” and they pointed out that there was a reason he didn’t. Spitzer’s 1974 paper was an attempt to put numbers to the widely noted poor reliability of DSM-II diagnoses, “belittling the reliability of the past,” as Kirk and Kutchins put it, in order to set the stage for the transition to a criterion-based future.
Having established kappa as the arbiter of reliability and having promised success for his new diagnostic approach, Spitzer then had to show that descriptive diagnosis would achieve high kappas. And the DSM-III field trials did exactly that, in no small part because the patients were preselected for the likelihood that they would qualify for the diagnosis and the researchers were drilled on the criteria—and on a clinical interview that left little to chance or imagination. Now that his ox was in danger of being gored, Spitzer had also come up with some new ways to evaluate kappa—no longer was .70 “only satisfactory”; now it was a “high kappa [that] indicates good agreement.” And by this standard, the DSM-III field trials were a success, far outpacing results using the earlier manuals and, in many cases, well exceeding the .70 threshold.
If psychiatrists noticed that the fix had been in from the start, they didn’t say so—most likely because the numbers gave the profession exactly what it had been looking for. But just as Spitzer once denigrated the past to clear the way for the DSM-III, Kraemer was now claiming that Spitzer’s results were patently ridiculous. “I think it will be a miracle if we get a kappa of .80,” she said. “In fact, if someone comes to me and tells me we’ve got one, I’m going to tell her to go back and make sure there hasn’t been some kind of screwup.”
Kraemer didn’t seem worried about the implications of repudiating the very same statistics that the APA had been using for thirty years to stake its claim to scientific respectability or, for that matter, of calling Bob Spitzer a screwup. But she was telling us what numbers we could expect and what they would mean—and she was singing a much different tune from Spitzer’s.
Most of the kappas, Kraemer predicted, would come in somewhere between .40 and .60—the same reliability Spitzer characterized as “no better than fair.” But doctors in other fields were satisfied with reliability like this, she said. There didn’t seem to be any of those doctors around to object to this—say, an endocrinologist who could testify as to whether it would truly be a miracle if he and a random colleague agreed most of the time on whether a patient with high blood sugar had diabetes—but Kraemer seemed to anticipate the danger, and she tacked. “Until you can do X-rays, until you can do scans, until you can do tissue samples,” she said, psychiatric diagnoses could only “aspire to be as good as medical diagnosis.” In the meantime, she added, even if the kappa was only between .20 and .40—numbers that Spitzer didn’t even deign to characterize—“it will be acceptable.”
First was back at the audience microphone. I knew better than to think he would ask exactly how (and to whom) these numbers were acceptable or whether lowering expectations was really such a good idea, given how central reliability had been to reestablishing psychiatry’s credibility, or how the APA was going to convince the public that sharply reduced reliability was actually an improvement over the earlier product.
Still, his question, a version of the one he had asked Narrow, was plenty barbed. Let’s say you end up with a kappa of .10 on the field trial, he said—a result that wouldn’t clear even Kraemer’s very low bar. “Presumably they are going to use this information to make the DSM-5 better before it is released.” But, he continued, given the impossibility of sorting out the way clinicians used the criteria from the criteria themselves, “how can people find out where the problem is, and how do you know how to fix it? How will they even know where to look?”
“This is why I use the analogy of field testing and airplanes,” Kraemer replied. “The airplane crashes, the question is why did it crash and what are you going to do about it? There’s a lot of information they”—I think she meant the APA, not the National Transportation Safety Board—“can look at, but it’s not a matter of analyzing the data to find out exactly what’s wrong.” Kraemer seemed to be saying that the point wasn’t to sift through the wreckage and try to prevent another catastrophe but, evidently, to crash the plane and then announce that the destruction could have been a lot worse.
To be honest, however, I wasn’t sure. She was not making all that much sense, or maybe I just didn’t grasp the complexities of statistical modeling. And besides, I was distracted by a memory of something Steve Hyman once wrote. Fixing the DSM, finding another par
adigm, getting away from its reifications—this, he said, was like “repairing a plane while it is flying.” It was a suggestive analogy, I thought at the time, one that recognized the near impossibility of the task even as it indicated its high stakes—and the necessity of keeping the mechanics from swearing and banging too loudly, lest the passengers start asking for a quick landing and a voucher on another airline. But whatever Hyman meant, I was now pretty certain that he wasn’t thinking that the solution was to fly the plane into the ground on purpose, to put the wounded craft out of its misery before it plummeted to earth of its own accord.
First yielded the mic and went back to the end of the line for another round. The next questioner wanted to know why dimensional assessments (an earlier topic of the panel) hadn’t been in DSM-IV. “Actually, you can turn around and ask Michael,” Kraemer said.
“The answer is that there are dimensional measures,” First said. He named them: the severity scales for various disorders, the one-hundred-point scale to rate overall functioning, and a numerical scale to rate the psychosocial stressors a patient was facing. “But the reception among clinicians has been a resounding ‘We’re not interested.’”
His answer was game, and accurate, but it didn’t really matter. He had lost the match. As Kraemer had just made all too clear, she was up there with Regier and Kupfer and he was down on the floor, just another guy in line—and one who, she had just implied, was responsible for the mess they had spent the last ten years trying to clean up.
• • •
“We’d better get smart about measuring13 what we do and proving its value,” psychiatrist Lawson Wulsin told another audience at the annual meeting, “or we are going to lose.” Wulsin was the liaison between the DSM-5 task force and other medical specialties, so even if the APA’s plane-crash logic made any sense to him, he had to know that his nonpsychiatrist colleagues weren’t going to fall for this strategy, especially not if it hinged on claiming that, at least when it came to diagnosis, the rest of medicine was just as unreliable as psychiatry.
But it was too late to change course. Spitzer had chosen kappa as the pole star and used it to steer psychiatry off the shoals. He gave his profession numbers to cite as evidence that it was sailing in the same seas as the rest of medicine. So the proof of the DSM-5’s success would also have to be in numbers. With the field trials only just under way, it remained to be seen if the leaders of the APA had really gotten smart about proving they could measure what they do and sell us on its value.
Chapter 15
You have to understand1,” Allen Frances is telling me. “What Bob Spitzer was interested in, no one else was. I certainly wasn’t. Because we were all discovering the meaning of life in every patient.”
It’s July 2011. I’ve known Frances for nearly a year, a year in which I’ve had no luck getting him to explain to me how a man comes to scorch the earth he once strode, how, for that matter, he came to walk it in the first place, how you get to be America’s top psychiatrist without just a smidgen of ambition—and, while he’s at it, to help me understand how he can simultaneously believe that DSM disorders are not real but that the book nonetheless deserves its authority, how he can both prize the truth and champion the noble lie, and how these contradictions can fit in one life. We’ve exchanged hundreds of e-mails and sent each other books in the real mail. We’ve spent four or five days together, the last two with his wife, Donna Manning, and his two teenage grandchildren. I’ve slept on his living room floor, watched Gunsmoke on his bedroom TV, eaten at his table, sat squeezed between the grandkids in the backseat of his convertible, buffeted by the wind on an excursion down to Big Sur. He’s ribbed me mercilessly about, among other things, my “naiveté” in thinking that the unfolding debacle of the DSM-5 is about anything more than the rank incompetence of its architects, that in insisting that deep historical forces are at work, I am playing Abbott to the APA’s Costello. He has even given me the Herb Peyser treatment, leaning across the table and pinching my cheek (he spared me the kiss) when I tried to explain to him the dance between editor and writer that had landed his “bullshit” comment in the lead of my Wired story—a pinch affectionate and hostile in equal parts and, most of all, devastatingly effective in stopping my explanation cold, the sudden silence thick with rebuke.
For a year he has evaded the question. He’s demurred on tactical grounds—“If it seems like this is coming2 from a personality quirk,” he told me, “then the message will get lost.” He’s protested that he’s “just no good at introspection,” and when floating like a butterfly has failed him, he’s stung me with the accusation that I care about all the wrong things, that these questions are worthy of People and not a serious book about the DSM. But as we wind through the California hills early on a July morning, top up, no grandkids, he poses my question perfectly: “You mean, how could a person be interested in the drama of human emotions and psychology and at the same time spend seven years of his life trying to be precise about things at the descriptive surface?” This gets him to reminiscing about the old days with Spitzer at Columbia, and I’m thinking that maybe this year of getting teased and scolded and lectured has all been a prolonged hazing, that I have passed some kind of test, that he is finally going to untangle some of these contradictions.
“The meaning of life,” I repeat. “And what was that?”
“That I’m an even bigger schmuck than I think I am.”
It’s all a little hard to swallow, this brilliant, erudite, effortlessly dominating man insisting that he is stupid and feckless, and evading an account of his own motivation in the bargain. But then again, this is the man who put a disclaimer at the front of the DSM as a prophylaxis against people taking the book too seriously, who thinks that insisting mental disorders don’t actually exist even as you enumerate them will somehow make people stop acting as if the disorders are real. He might really think that humility is enough to erase power, that what Montesquieu really meant was that when aristocrats exercised modesty, it somehow stopped them from having more money than everyone else.
“The least important person on the field is the general,” Frances is saying. “The battle is won by an individual soldier deciding to stand, and having his buddies follow him.” He’s wrested the conversation back to the subject he wants to talk about the most: the small but determined claque of DSM-5 resisters that has recently coalesced around him, attracted, it seems, to his irresistible combination of gravitas and dissent. The soldier in question is a Floridian named Dayle Jones. She’s a counselor, a blogger for the American Counseling Association (ACA), and a member of the committee revising the ICD, and she has been assembling her own case against the DSM-5, trying to warn the 115,000 members of her organization that the new manual will be a disaster for them. “She could be, when all is said and done, the most influential person in all of DSM-5,” Frances says. The APA may be free to ignore him, he explains, but Dayle Jones has been telling the ACA members that they can get the ICD at no cost, and the APA is surely “going to listen to 115,000 buyers.”
The general might be unimportant (and reluctant), but that doesn’t mean he isn’t leading the battle. “I’m doing several things to help [Jones],” he told me, but “it’s better that they’re not seen as coming from me.” He thinks the APA is more likely to listen to Jones and her organization, and to the other dissenters, if they are not seen as affiliated with him. Or with me, evidently. “She will be an important character in your book,” he says. But I should steer clear for a while. “It won’t look good if she’s doing this from your press box. This should just be a concerned lady from Florida. I don’t want anything to happen to queer what I’m trying to do.”
For a guy who has no interest in power, he sure knows how to use it.
Still, when he says, “My narcissism couldn’t survive the teenage insight3 that we are all insignificant and transient worms, that no one has much stuff to strut,” it’s impossible to dismis
s that as just another facile gloss from Machiavelli or Sun Tzu. And even if he has copped to exaggerating the flaws in DSM-IV for rhetorical purposes, there doesn’t seem to be anything tactical in his acknowledgment of the error that launched the mission we are on today, the one that has gotten Frances dressed in his Sunday best, and both of us up and out of the house by six. “This was one royal fuckup,” he says, and he’s on his way to try to set it straight.
• • •
Our destination is a meeting of lawyers. The royal fuckup was the kind that only they could love, an opportunity buried deep in the interstices of the DSM text, ready to be excavated and exploited—in this case by prosecutors who, aided by psychiatrists, can use it to keep certain sex offenders locked up well beyond the end of their sentences, indefinitely or maybe even forever.
That wasn’t what Frances and the rest of the DSM-IV crew had in mind when they decided that meeting Criterion A for Pedophilia—“over a period of at least 6 months4, recurrent, intense sexually arousing fantasies, sexual urges or behaviors involving sexual activity with a prepubescent child”—was not enough to diagnose a patient with a mental disorder. The offender also needed to meet Criterion B: “The fantasies, sexual urges, or behaviors cause clinically significant distress or impairment in social, occupational, or other important areas of functioning.” This was only the boilerplate clinical significance criterion that had been added to many diagnoses in DSM-IV, and all Frances and First had intended by it, as they wrote in a 2008 American Journal of Psychiatry editorial, was to remind clinicians that, as they put it in the DSM-IV, “the symptom criteria alone5 are insufficient to define mental disorder.”
The Book of Woe: The DSM and the Unmaking of Psychiatry Page 25