The Bell Curve: Intelligence and Class Structure in American Life
Page 85
29 The so-called Bay Area Survey, described in Neuman 1981, 1986.
30 See note 21.
31 Neuman 1986, p. 117.
32 Useful summaries can be found in Abramson and Claggett 1991; Hill and Luttbeg 1983; Kleppner 1982; Peterson 1990; Rothenberg and Licht 1982.
33 E.g., Milbrath and Goel 1977. Biological and social scientists have lately tried to enrich our understanding of “political man” by showing the links to social behavior in other species. For background to the huge literature on the variety of influences on political behavior and attitudes, see Converse 1964; Kinder and Sears 1985; Rokeach 1973.
34 Harvey and Harvey 1970.
35 Neuman 1986. 36. Luskin 1990.
Chapter 13
1 For a useful recent critIQue of the treatment of race by psychologists, also demonstrating how difficult (impossible?) it is to be detached about this issue, see Yee et al. 1993.
2 Lynn 1991c.
3 Lynn 1987a. For a critique of Lynn’s early work, see Stevenson and Azuma 1983.
4 For those who want to reconstruct the debate, Lynn’s 1987 and 1991 review articles followed on earlier studies: Lynn 1977, 1978, 1982; Lynn and Hampson 1986b. For his response to Flynn’s 1987 critique, see Lynn 1987b.
5 Chan and Vernon 1988.
6 Lynn and Song 1994.
7 Iwawaki and Vernon 1988; Vernon 1982.
8 Flynn 1991; Sue and Okazaki 1990.
9 Flynn 1991.
10 Lynn 1993b.
11 Lynn 1987a, 1987b, 1989, 1990a, 1990b, 1991b, 1991c, 1992, 1993a, 1993b; Lynn and Hattori 1990; Lynn, Pagliari, and Chan 1988.
12 Lynn, Hampson, and Iwawaki 1987.
13 Lynn 1991c.
14 Stevenson et al. 1985.
15 Lynn 1991a, p. 733. Lynn has noted that the mean white IQ in Minnesota is approximately 105, well above the average for the American white population. On the other hand, it is possible that the cities chosen in Japan and Taiwan were similarly elevated.
16 An excellent account of the literature may be found in Storfer 1990, pp. 314-321, from which our generalizations are taken. For Jews in Britain, see also Lynn 1992.
17 Storfer 1990, pp. 321-323.
18 As reported in Jensen 1984b, p. 479.
19 Sattler 1988.
20 A detailed and comprehensive review of the literature through 1980 may be found in Osborne and McGurk 1982; Shuey 1966. For an excellent onevolume synthesis and analysis, see Loehlin, Lindzey, and Spuhler 1975.
21 Standard deviations are explained in Appendix 1.
22 To qualify, all studies had to report data for both a white and black sample, with a sample size of at least fifty in each group, drawn from comparable populations that purported to be representative of the general population of that age and geographic area (studies of special populations such as delinquents were excluded). Socioeconomic status posed a special problem. If a study explicitly matched subjects by SES, it was excluded. If it simply drew its samples from a low-SES area, it was included, even though some degree of matching had occurred. The study had to use a standardized test of cognitive ability, although not all of them were IQ tests and not all included a complete battery. If the scores were reported as IQs, a standard deviation of 15 was imputed if no standard deviations for that sample were given.
23 To get the IQ equivalent of SD differences, multiply the SD difference by 15; hence, 1.08 X 15 = 16.2 IQ points.
24 This figure is based on non-Latino whites. The difference between blacks and the combined white-Latino sample in the NLSY is 1.12 SDs. Because the U.S. Latino population was proportionally very small until the 1970s, the NLSY figure for non-Latino whites is more comparable to the earlier tests, in terms of definition of the sample, than the figure for the combined white-Latino sample, and we shall use it exclusively in discussions of the NLSY data throughout the chapter.
25 The formula is , where N is the sample size, X is the sample mean, ς is the standard deviation, and w and b stand for white and black, respectively (taken from Jensen and Reynolds 1982, p. 425). Note that our white sample differs from the one used in Office of the Assistant Secretary of Defense (Manpower) (1982). The “white” sample in that report included all persons not identified as Hispanic or black, whereas our “white” sample also excluded persons identifying themselves as American Indians or a member of an Asian or Pacific ethnic group. The NLSY and the AFQT are described in the Introduction to Part II and Appendix 2.
26 This is a very rough estimate. As of 1994 there were approximately 32.8 million blacks in America. If the estimate is computed based on the mean IQ (86.7) and standard deviation (12.4) of blacks in the NLSY, a table of the normal distribution indicates that only about 0.1 percent, or about 33,000, would have IQs of 125 or higher. If one applies the observed distribution in the NLSY and asks what proportion of blacks are in the top five percent of the AFQT distribution (roughly corresponding to an IQ of 125), the result, 0.4 percent, implies that the answer is about 131,000. There are reasons to think that both estimates err in different directions. We compromised with 100,000.
27 For example, no external evidence for bias has turned up with the WISC, WAIS, Stanford-Binet, Iowa Test of Educational Development, California Achievement Test, SAT, ACT, GRE, LSAT, MCAT, Wonderlic Personnel Test, GATB, and ASVAB (including the AFQT in particular).
28 If any bias has been found, it shows that test scores for blacks often “over--predict” performance; that is, the tests are biased “in favor” of blacks, tending for unknown reasons to predict higher performance than is actually observed. See Appendix 5 for details.
29 Weiss 1987, p. 121. A separate argument, made in Zoref and Williams (1980), adduced evidence that verbal items in IQ tests are disproportionately based on white males “in role-stereotyped representations.” The authors do not present evidence that performance on these items varies by race or gender in ways that would indicate bias but rather indict the tests as a whole on the basis of their sexism and racism.
30 The reason why the “oarsmaniregatta” example has been used so often in descriptions of cultural bias is that it is one of the few items in the SAT that looks so obviously guilty. Perhaps if a test consisted exclusively of items that were equivalent to the example, it would be possible to demonstrate cultural bias statistically, but no modern test has more than a few that come close to “oarsmantregatta.”
31 The definitive assessment of internal evidence of bias is in Jensen 1980.
32 E.g., Valencia and Rankin 1988; Munford and Muñoz 1980.
33 For a review, see Jensen 1980.
34 The NLSY has higher scores for whites than blacks on backward digit span and virtually no difference at all for forward digit span. In a similar way, SES differences within races are also greater for backward digit tests than forward digit tests (Jensen and Figueroa 1975).
35 Gordon 1984. See Farrell 1983, and the attached responses, for an attempt to explain the difference in digit span results through cultural bias hypotheses.
36 Another commonly used apparatus involves a home button and a pair of other buttons, for yes and no, in response to tasks presented by a computer console. The results from both types of apparatus are congruent.
37 The literature is extensive, and we are bypassing which aspect of reaction time in fact covaries with g. For our purposes, it is only necessary that some aspects do so. For some of the issues, see, for example, Barrett, Eysenck, and Lucking 1986; Matthews and Dorn 1989; Vernon 1983; Vernon et al. 1985.
38 Jensen and Munro 1979.
39 Jensen 1993b.
40 The dependent variable is age-equated IQ score, and the independent variables are a binary variable for race (white or black) and the parental SES index. The difference between the resulting predicted IQs is divided by the pooled weighted standard deviation.
41 Among the young women in the RAND study of adolescent pregnancy described in Chapter 8 (Abrahamse et al, 1988), drawn from the nationally representative High School and Beyond sample,
the same procedure reduced the B/W difference by 32 percent. See also Jensen and Reynolds 1982 and Jensen and Figueroa 1975.
42 For some people, controlling for status is a tacit way of isolating the genetic difference between the races. This logic is as fallacious as the logic behind controlling for SES that ignores the ways in which IQ helps determine socioeconomic status. See later in the chapter for our views on genetics and the B/W difference.
43 In other major studies the B/W difference continues to widen even at the highest SES levels. In 1975, for example, Jensen and Figueroa (1975) obtained full-scale WISC IQ scores for 622 whites and 622 blacks, ages 5 to 12, from a random sample of ninety-eight California school districts. They broke down the scores into ten categories of SES, using Duncan’s index of socioeconomic prestige based on occupation. They found a B/W discrepancy that went from a mere .13 SD in the lowest SES decile up to 1.20 SD in the highest SES decile. Going to the opposite type of test data, the Scholastic Aptitude Test taken by millions, self-selected with a bias toward the upper end of the cognitive distribution, the same pattern emerged. In 1991, to take a typical year, the B/W difference among students whose parents had less than a high school diploma was .58 SD (averaging verbal and mathematical scores), while the B/W difference among students whose parents had a graduate degree was .78 SD. (National Ethnic/Sex Data for 1991, unpublished data available by request from the College Board). In their separate reviews of the literature, Audrey Shuey (whose review was published in 1966) and John Loehlin and his colleagues (review published in 1975) identified thirteen studies conducted from 1948 through the early 1970s that presented IQ means for low- and high-SES groups by race. In twelve of the thirteen studies, the black-white difference in IQ was higher for the higher-SES group than for the lower-SES group. For similar results for the 1981 standardization of the WAIS-R, see Reynolds et al. 1987. A final comment is that the NLSY also shows an increasing B/W difference at the upper end of the socioeconomic scale when the 1980 AFQT scoring system is used and the scores are not corrected for skew. See Appendix 2 for a discussion of the scoring issues.
44 Kendall, Verster, and Mollendorf 1988.
45 Kendall, Verster, and Mollendorf 1988. For another example, this time of an entire book devoted to testing in the African setting that fails to mention a single mean, see Schwarz and Krug 1972.
46 Lynn 1991c.
47 Boissiere et al. 1985.
48 Owen 1992.
49 Reynolds et al. 1987.
50 Vincent 1991.
51 Vincent also cites two nonnormative studies of children in which the B/W differences ranged from only one to nine points. These are the differences after controlling for SES, which, as we explain in the text, shrinks the B/W gap by about one-third.
52 Jensen 1984a; Jensen and Naglieri 1987; Naglieri 1986. They point out that the K-ABC test is less saturated with g than a conventional IQ measure and more dependent on memory, both of which would tend to reduce the B/W difference (Naglieri and Bardos 1987).
53 Jensen 1993b.
54 Based on the white and black SDs for 1980, the first year that standard deviations by race were published.
55 Wainer 1988.
56 Our reasons for concluding that the narrowing of the B/W differences on the SAT was real, despite the potential artifacts involved in SAT score, are as follows. Regarding the self-selection problem, the key consideration is that the proportion of blacks taking the test rose throughout the 1976-1993 period (including the subperiod 1980-1993). In 1976, blacks who took the SAT represented 10 percent of black 17-year-olds; in 1980, the proportion had risen to 13 percent; by 1993, it had risen to about 20 percent. While this does not necessarily mean that blacks taking the SAT were coming from lower socioeconomic groups (the data on parental education and income from 1980 to 1993 indicate they were not), the pool probably became less selective insofar as it drew from lower portions of the ability distribution. The improvement in black scores is therefore more likely to be understated by the SAT data than exaggerated.
Howard Wainer (1988) has argued that changes in black test scores are uninterpretable because of anomalies that could be inferred from the test scores of students who did not disclose their ethnicity on the SAT background questionnaire (nonresponders). Apart from several technical questions about Wainer’s conclusions that arise from his presentation, the key point is that the nonresponder population has diminished substantially. As it has diminished, there are no signs that the story told by the SAT is changing. The basic shape of the falling trendline for the black-white difference cannot plausibly be affected by nonresponders (though the true means in any given year might well be somewhat different from the means based on those who identify their ethnicity).
57 The range of .15 to .25 SD takes the data in both the text and Appendix 5 into account. To calculate the narrowing in IQ terms, we need to estimate the correlation between IQ and the various measures of educational preparation. A lower correlation would shrink the estimate of the amount of IQ narrowing between blacks and whites, and vice versa for a higher estimate. The two- to three-point estimate in the text assumes that this correlation is somewhere between .6 and .8. If we instead rely entirely on the SAT data and consider it to be a measure of intelligence per se, then the narrowing has been four points in IQ, but only for the population that actually takes the test.
58 A change of one IQ point in a generation for genetic reasons is not out of the realm of possibility, given sufficient differential fertility. However, the evidence on differential fertility (see Chapter 15) implies not a shrinking black-white gap but a growing one.
59 Jaynes and Williams 1989; Jencks and Peterson 1991.
60 Linear extrapolations are not to be taken seriously in these situations. A linear continuation of the black and white SAT trends from 1980 to 1990 would bring a convergence with the white mean in the year in 2035 on the Verbal and 2053 on the Math. And when it occurs, racial differences would not be ended, for if we apply the same logic to the Asian scores, in that year of 2053 when blacks and whites both have a mean of 555 on the Math test, the Asian mean would be 632. The Asian Verbal mean (again, based on 1980-1990) would be 510 in the year 2053, forty-seven points ahead of the white mean. But—such is the logic of linear extrapolations from a short time period—the black Verbal score would by that time have surpassed the white mean by thirty-seven points and would be 500, only ten points behind the Asians. In 2069, the black Verbal mean would surpass the Asian Verbal mean. Linear trends over short periods of time cannot be sensibly extrapolated much into the future, notwithstanding how often one sees such extrapolations in the media.
61 See Appendix 5 for ACT results. In short, the mean rose from 16.2 to in 1986 to 17.1 in 1993. The number of black ACT students also continued to rise during this period, suggesting that the increase after 1986 was not the result of a more selective pool.
62 Chapter 18 explores this line of thought further.
63 SAT trends are subject to a variety of questions relating to the changing nature of the SAT pool. The discussion that follows is based on unreported analyses checking out the possibility that the results reflect these potential artifacts (e.g., changes in the proportion of Asians using English as their first language; changes in the proportion of students coming from homes where the parents did not go to college). The discussion of these matters may be found in Chapter 18.
64 The first year for which a frequency distribution of scores by ethnicity has been published is 1980.
65 Trying to predict trends on the basis of equivalent percentage changes from different baselines is a treacherous proposition. A comparison with black and Asian gains makes the point. For example, the percentage of blacks scoring in the 700s on the SAT-Verbal grew by 23 percent from 1980 to 1990, within a percentage point of the Asian proportional increase. For students scoring in the 600s, the black increase was 37 percent, not far below the Asian increase of 48 percent. The difficulty with using proportions in this instance is that the base
lines are so different. Take the case of students scoring in the 600s on the SAT-V, for example. The proportions that produced that 37 percent increase for blacks were eleven students out of a thousand in 1980 versus fifteen students out of a thousand in 1990. The Asian change, put in the same metric, was from fifty-five students in 1980 to eighty-one students in 1990. For every four students per thousand that blacks gained in the 600 group, Asians gained twenty-six per thousand.
66 This statement is based on a calculation that assumes that the 1980 distribution of scores remained the same except for the categories of interest. To illustrate, in 1980, 19.8 percent of black students scored from 200 to 249. In 1993, only 13.1 percent scored in that range. Suppose that we treat the percentage distribution for 1980 as if it consisted of 1,000 students. In that year, 198 of those students scored in the 200 to 249 range. We then recompute the mean for the 1980 distribution, substituting 128 for 198 in the 200 to 249 point category (assigning midpoint values to all the intervals to reach a grouped mean), so in effect we are calculating a mean for a fictitious population of 1000—198 + 128 = 930. (The actual calculations used unrounded proportions based on the actual frequencies in each interval.)
A technical note for those who might wish to reproduce this analysis: When means are computed from grouped data, the midpoint of an interval is not necessarily the actual mean of people in that interval, usually because more than 50 percent of the scores will tend to be found in the fatter part of the distribution covered by the interval but also because scores may be bunched at the extreme categories. In the SAT-Math, for example, a disproportionate number of the people in the interval from 750 to 800 have scores of 800 and of those in the interval from 200 to 249 have scores of 200 (because they guessed wrong so often that their score is driven down to the minimum). Such effects can produce a noticeable bias in the estimated mean. For example, the actual verbal mean of black students in 1980 was 330. If one computes the mean based on the distribution published annually by the College Board, which run in fifty-point intervals from 200 to 800, the result is 336.4. The actual mean in 1990 was 352; the grouped mean is 357.9. The computed figure in the text is based on the surrogate mean as described above compared to the grouped 1980 and 1990 means, to provide a consistent framework.