At the time of writing, the American Psychiatric Association has just released a provisional timetable for the publication of the fifth edition of the Diagnostic and Statistical Manual, which is tentatively scheduled for 2010. Although the structure of the manual is yet to be determined, some observers have already speculated about its contents. For example, the American psychologist Roger Blashfield, in a paper ridiculing the entire approach, used statistical techniques to analyse trends in DSMs I–IV.32 Based on his calculations, he predicted that DSM-V will have 1256 pages, will contain 1800 diagnostic criteria and eleven appendices, and will generate $80 million in revenue for the APA. As the only basic colour that has yet to be used on the cover of a DSM is brown, Blashfield has predicted that this colour will be used for DSM-V.
By now the reader may be wondering whether the enormous effort and expenditure required to complete these successive revisions of the DSM has yielded the promised improvements in reliability. The neoKraepelinians clearly believed that they had succeeded in this respect. Reviewing the results of the DSM-III field trials, Spitzer described the reliability of the manual as extremely good.33 Similarly, Gerald Klerman (the psychiatrist who had given the name to the neoKraepelinian movement) felt able to declare that, in principle, the reliability problem had at last been solved.34
In a detailed review of the evidence from the field trials, psychologists Herb Kutchins and Stuart Kirk reached a starkly different conclusion, arguing that, ‘The DSM revolution in reliability has been a revolution in rhetoric, not in reality.’35 They noted that the trials were beset with numerous methodological problems. For example, little control was exercised over the way in which the studies were conducted at different sites, so that estimates of reliability were often calculated on the basis of agreements between close colleagues about small numbers of patients. Even so, the results for many diagnoses were hardly impressive, and often failed to reach the kappa value of 0.7 defined in advance as acceptable by Spitzer and his colleagues.
Following the publication of DSM-III, ever more ambitious reliability studies were conducted, but most obtained comparable results. For example, a seven-centre study carried out in the USA and Germany was reported by a team that included Spitzer.36 Nearly 400 patients and just over 200 ordinary people were examined by pairs of specially trained psychiatrists. The psychiatrists used an interview schedule specifically designed to yield the information required for assigning DSM-III-R diagnoses. Even under these ideal circumstances, levels of agreement for patients averaged at a kappa of 0.61.
Of course, the diagnoses obtained in these kinds of studies are often obtained in ideal conditions. The diagnosticians taking part usually receive special training and employ standardized interview schedules that define precisely the questions they may ask their patients. Australian psychiatrist Patrick McGorry and his colleagues have argued that the levels of agreement observed therefore give a spuriously positive impression of what can be achieved in routine clinical practice.37 To investigate the effects of varying interview procedures, they asked psychiatrists to assign DSM-III-R diagnoses using four different methods. Three of the methods involved the use of special interview schedules or checklists to assess patients’ symptoms. The fourth involved each patient being interviewed by a team, which then arrived at a consensus diagnosis by discussion. When any two methods were compared, kappa values varied between 0.53 and 0.67. Full agreement between all four procedures was achieved for only twenty-seven out of the fifty patients who participated in the study.
The Vanishing Consensus Effect
The development of the DSM system has so mesmerized recent thinking about psychiatric disorders that it is easy to forget that rival sets of criteria have also been proposed, such as the eighth and subsequent editions of the International Classification of Disease, and the Feighner system. In addition to these major challengers to the diagnostic hegemony of the American Psychiatric Association, there are also a number of idiosyncratic definitions of specific disorders proposed by various researchers (for example, definitions of schizophrenia suggested by Taylor and Abrams38 and by Carpenter and his colleagues39 in the United States). When these different criteria are compared there is little to indicate which, if any, embodies the most meaningful taxonomy of mental disorders. What is clear is that the apparent consensus created by the DSM system is illusory.
A dramatic demonstration of this vanishing consensus effect was reported by Ian Brockington of the University of Birmingham, who applied various definitions of schizophrenia to symptom data collected from patients studied in the US–UK Diagnostic Project.40 Eighty-five patients from Netherne Hospital in London had received a hospital diagnosis of schizophrenia at the time of the study. Brockington estimated that 163 would have received the diagnosis according to the then existing American criteria. When the project team applied the ICD-8 definition, only 65 were found to suffer from schizophrenia. This number fell to 55 when a computer-generated diagnosis based on Schneider’s first-rank symptoms was employed. When the Research Diagnostic Criteria (a variant of the Feighner system) were used, the number of schizophrenia patients was 28, and the use of the DSM-III criteria saw this number fall even further to 19. Thus, the number of schizophrenia patients in the sample varied between 163 and 19, according to the definition chosen.
These troubles have not been restricted to the diagnosis of schizophrenia. In a study recently published by Jim van Os and his colleagues in Britain and Holland, over 700 patients with chronic psychosis were classified according to the RDC, DSM-III-R and ICD-10 systems.41 The numbers of patients meeting various definitions of schizophrenia, depression, mania and other diagnoses are shown in Table 3.5. For example, it can be seen that the number of patients in the sample suffering from mania varied from 18 (according to the RDC definition) to 87 (according to the DSM-III-R definition), a staggering nearly fivefold difference.
As a remedy for the vanishing consensus effect, some researchers have advocated a ‘polydiagnostic’ approach, in which several diagnostic systems are used simultaneously. A leading exponent of this approach is Peter McGuffin, Professor of Psychiatry at the Institute of Psychiatry in London, who has developed a computer program known as OPCRIT, which generates diagnoses according to the
Table 3.5 Distribution of diagnoses in 706 psychotic patients according to three different diagnostic criteria (from J. van Os et al. (1999) ‘A comparison of the utility of dimensional and categorical representations of psychosis’, Psychological Medicine, 29: 595–606).
Diagnosis
RDC
DSM-III-R
ICD-10
N
%
N
%
N
%
Schizophreniform disorder
-
20
2.8
-
Schizophrenia
268
38.0
371
52.6
387
54.8
Schizoaffective manic
98
13.9
41
5.8
Schizoaffective bipolar
129
18.3
13
1.8
23
3.3
Schizoaffective depressed
118
16.7
40
5.7
Major depression
16
2.3
71
10.1
19
2.7
Mania
18
2.6
87
12.3
61
8.6
Bipolar disorder
16
2.3
66
9.4
6
0.9
Unspecified functional psychosis
43
6.1
68
9.6
95
13.5
Delusional disorder
-
10
1.4
18
2.6
Not classified
-
-
16
2.3
DSM-III, DSM-III-R, DSM-IV, ICD-9 and ICD-10 systems.42 Whether this approach clarifies or obscures the business of conducting psychiatric research is open to debate. Certainly, it offers no guidance to the practising clinician.
4
Fool’s Gold
In his fantastic essay ‘John Wilkins’ analytical language’, the Argentinian writer Jorge Luis Borges remarks on a classification of animals described in a certain Chinese encyclopaedia, The Celestial Emporium of Benevolent Knowledge:
In those remote pages it is stated that animals can be divided into the following classes: (a) belonging to the Emperor; (b) embalmed; (c) trained; (d) sucking pigs; (e) mermaids; (f) fabulous; (g) stray dogs; (h) included in this classification; (i) with the vigorous movements of madmen; (j) innumerable; (k) drawn with a very fine camel hair brush; (l) etcetera; (m) having just broken a large vase; (n) looking from a distance like flies.1
Borges thereby reminds us that not all taxonomies are meaningful. If his joke seems far removed from reality, consider the real-life example of astrology, a system of classification that provides a fool’s-gold standard against which to evaluate modern psychiatric diagnoses. Like diagnoses, star signs are supposed to describe human characteristics and to predict what will happen to people in the future. Although there is no evidence to support these claims, astrology is a system of classification that continues to capture the imagination of large numbers of otherwise intelligent people. Indeed, a recent poll revealed that almost one quarter of American adults believe in astrological theories.2
In the last chapter, we saw that one way of assessing the usefulness of a diagnostic system is to measure its reliability. However, the example of astrology illustrates the limitation of this approach. Star signs are highly reliable (we can all agree about who is born under Taurus), so reliability alone cannot ensure that a diagnostic system is scientific. Further tests of the validity of the system are necessary to determine whether it fulfils the functions for which it has been designed. We can test the validity of astrological theories by seeing whether people born under the sign of Libra really are well-balanced, or whether most Scorpios really do meet the beautiful stranger of their dreams in the first quarter of the year. Similarly, we can evaluate the validity of diagnostic categories by seeing whether they lead us to useful scientific insights or helpful clinical predictions.
Of course, although reliability does not guarantee validity, it is obvious that a diagnostic system cannot be valid without first being reliable. Unless psychiatrists and psychologists can agree about which patients suffer from which disorders there is no possibility that the process of diagnosis will fulfil any useful function. In the last chapter I established that, for the most part, modern psychiatric diagnoses fail to meet adequate standards of reliability. Some readers might therefore be forgiven for wondering whether there is any point in proceeding to examine validity evidence in detail. However, there are two good reasons for doing so. First, some readers, particularly those who have trained in the mental health professions, may require further persuasion before abandoning long-held assumptions about the nature of madness. Second, as we study the validity of psychiatric diagnoses we will encounter evidence that will be useful when attempting to construct a scientific alternative to the Kraepelinian system.
Diagnoses as Descriptions
One function of a diagnosis is to provide a shorthand description of a patient’s complaints. On the basis of a diagnosis recorded in the medical notes, a clinician about to meet a patient for the first time should be able to anticipate the range of symptoms that the patient will be suffering. However, diagnoses can only achieve this function if they accurately reflect the way that symptoms cluster together. Determining whether they do or do not is quite difficult, and requires the use of complex statistical procedures. Fortunately, the principles behind these procedures can be readily understood without knowledge of the relevant mathematical equations.
How many psychoses are there?
For a categorial system of diagnosis to work, diagnoses must be jointly exhaustive (there should be no psychiatric patients who fail to meet the criteria for a diagnosis) and mutually exclusive (patients, unless they are very unlucky, should not suffer from more than one disorder).3 In Kraepelin’s original system, there were only two or three major categories of psychosis. However, in order to make the DSM exhaustive, its authors have dramatically increased the number of definitions included in successive editions of the manual, and have also included catch-all ‘not otherwise specified’ diagnoses in order to sweep up anyone who does not fit the criteria for a specific disorder. At the same time, in order to ensure that diagnoses are mutually exclusive, the authors have had to include special exclusion rules to limit the possibility that patients will fall into more than one category.
Including subtypes as well as major diagnoses, 94 categories of disorder were included in DSM-I. This number rose to 137 in DSM-II, to 163 in DSM-III, to 174 in DSM-III-R and finally to 201 in DSM-IV. This expansion has been accompanied by an increasingly finegrained subdivision of the major categories. For example, DSM-IV describes five subtypes of schizophrenia; two milder forms of psychosis (schizophreni-form disorder and brief psychotic disorder); schizoaffective disorder; delusional disorder (formerly paranoia); shared psychotic disorder (folie à deux); psychotic disorder due to a medical condition; substance-induced psychotic disorder; and, finally, the catch-all ‘psychotic disorder not otherwise specified’. Bipolar disorder appears as two different subtypes (Type I and Type II, distinguished by the severity of manic episodes) in a separate section on mood disorders, along with seven other types of mood disorder, ‘mood disorder not otherwise specified’, and four types of mood ‘episodes’.
The arbitrary exclusion rules designed to limit the risk that patients will be judged to suffer from more than one type of illness – a phenomenon known as comorbidity4 – are listed at the end of each set of diagnostic criteria. For example, DSM-IV states that patients may not be diagnosed as suffering from schizophrenia if they also meet the criteria for schizoaffective disorder, major depression or mania. Similarly, the criteria for bipolar disorder specify that the patient’s symptoms should, ‘Not be better accounted for by schizoaffective disorder and are not superimposed on schizophrenia, schizophreni form disorder, delusional disorder, or psychotic disorder not otherwise specified.’
That the exclusion rules hide, rather than eliminate, the problem of comorbidity became evident in one of the most impressive surveys of psychiatric symptoms ever conducted. Funded by the US National Institute of Mental Health and carried out during the 1980s, it was known as the Epidemiological Catchment Area (ECA) Study, and its overall purpose was to determine the prevalence of various types of psychiatric disorder in the general population. A total of 18,500 people over 18 years of age were randomly selected from five US cities, and interviewed about their experiences of psychiatric symptoms and emotional distress. Over 15,000 agreed to a further interview one year later. The interview data were then used to assign diagnoses according to DSM-III criteria.
When the researchers suspended the arbitrary exclusion rules built into the DSM system, they found that 60 per cent of those who had met the criteria for one disorder during their lifetime had also met the criteria for at least one other.5 In order to quantify the extent of current comorbidity – the presence of two or more DSM-III disorders at the same point in time – they calculated a statistic known as the odds ratio, which indicates the chance of having a second diagnosis if a first one is present. (An odds ratio of 1 indicates no increased chance of a second diagnosis, an odds ratio of 2 indicates that the presence of one diagnosis doubles the chance of another, and so on. The researchers decided in advance that an odds r
atio of 10 implied a strong association between diagnoses.)
On average, the odds that any two diagnoses would occur together turned out to be 2 (that is, twice what would be expected by chance). The odds ratio for depression and mania was 36, which was perhaps unsurprising as both are regarded as phases of a single disorder. However, the odds ratio for schizophrenia and mania was 46, and for schizophrenia and depression it was 14. Project leader Lee Robins and his colleagues concluded that ‘The most likely explanation for co-occurrence is that having one disorder puts the affected person at risk of developing other disorders.’6 Presumably, they had in mind the possibility that patients suffering from, say, schizophrenia, might be so distressed by the experience that they would become depressed as well. Strangely, they did not discuss the possibility that their findings might instead reflect the inadequacies of the neoKraepelinian system. Clearly, the most likely explanation for the strong associations observed between schizophrenia, depression and mania is that these diagnoses do not describe separate disorders.
We can test this suspicion by looking at symptoms in a different way. Suppose that, instead of diagnosing patients as schizophrenic or manic-depressive, we instead assign them scores indicating the extent to which their symptoms correspond to one diagnosis or the other. If we then place patients along this dimension of schizophrenic versus manic-depressive, we should be able to observe whether they fall into two separate groups with clearly different scores, or whether many patients have intermediate scores indicating a mixture of the two types of symptoms. I have just described, in a very simple way, the rationale for a statistical technique known as discriminant function analysis. The actual methods used to calculate the scores of individual patients are fairly complex and can be ignored for present purposes.7
Madness Explained Page 9