The Bell Curve: Intelligence and Class Structure in American Life
Page 75
The story regarding practice and coaching for such tests as the Scholastic Aptitude Test (SAT), the Law School Admissions Test (LSAT), and the Medical College Admissions Test (MCAT) is much more contentious than the story about IQ. Many people do take these tests more than once, many people practice for them, and many people get extensive coaching. Moreover, these tests are supposed to be “coachable,” insofar as they measure the verbal, reasoning, and analytic skills that a good education is supposed to enhance, and prolonged exposure to such coaching should produce better scores. Or to put it another way, two students with the same IQ should be able to get different LSAT and MCAT scores if one student has taken more appropriate courses and studied harder than the other student. That SAT scores declined by almost half a standard deviation from 1964 to 1980 strongly suggests that something coachable—or “negatively coachable” in this example—is being measured. In Chapter 17, we discuss the effects of coaching for the SAT, which are real but also smaller and harder to obtain than the widely advertised claims of the coaching industry.
The belief that coaching might explain part of the black-white gap often rests on a notion that, on the average, blacks receive less of the practice and coaching that might have elevated their scores than does the average white. We have already undermined this notion by showing that the tests are biased against blacks neither predictively nor in terms of particular item difficulties. There is, however, a literature that bears more directly on this idea, by looking for an interaction effect between practice or coaching and race.
If practice and coaching explain any portion of a group difference in scores in the population as a whole, then it necessarily follows that representative samples of those groups who are equally well practiced and well coached will show a smaller difference than is observed in the population at large. It is not enough that practice or coaching raises the mean score of the lower-scoring group; it must raise its mean score more than it raises the score of the higher-scoring group.
Several studies have investigated whether this is found for blacks and whites. In a well-designed study, representative samples of blacks and whites are randomly divided into two groups. The experimental black and white groups receive identical coaching (or practice), and the control groups receive no treatment at all. At the end of the experiment, the investigator has four different sets of results: test scores for coached blacks, uncoached blacks, coached whites, and uncoached whites. These results may be analyzed in three basic ways: One may compare blacks overall with whites overall, which will reveal the main effect of race; or the coached samples overall with the uncoached samples overall, which will reveal the main effect of the coaching; or the way in which the effects of coaching vary according to the race of the persons being coached, known as the interaction effect.
One study found a statistically significant differential response to practice, but not to direct instruction, on a reasoning test, between black and white college students.18 The differential advantage of practice for blacks compared to whites was about an eighth of the overall black-white gap on this test. Other studies have failed to find even this much of a differential response, or they have found differential responses in the opposite direction, tending to increase the black-white gap after practice.19 Taking the evidence as a whole, any differential coaching and practice effects by race (or socioeconomic status) is at most sporadic and small. If such a differential effect exists, it is too small to be replicated reliably. The scattered evidence of a differential effect is about as supportive of a white advantage from coaching as of a black advantage.
EXAMINER EFFECTS AND OTHER SITUATIONAL VARIABLES. Is it possible that disadvantaged groups come to the test with greater anxiety than confident middle-class students, and this mental state depresses their scores? That, when a black student takes a bus across town to an unfamiliar neighborhood and goes into a testing room filled with white students and overseen by a white test supervisor, this situation has an intimidating effect on performance? What about the time limits on tests? Might these have more pronounced effects on disadvantaged students than on test-wise middle-class students? All are plausible questions, but the answer to each is the same: Investigations to date give no reason to believe that such considerations explain a nontrivial portion of the group differences in scores.
The race of the examiner has been the subject of numerous studies. Of those with adequate experimental designs, most have showed nonsignificant effects; of the rest, the evidence is as strong that the presence of a white examiner reduces overall black-white difference as that a white examiner exacerbates the difference.20 Examinations of the results of time pressures fail to demonstrate either that blacks do better in untimed than in timed tests or that the test-taking “personal tempo” of blacks is different from that of whites.21 Test anxiety has been investigated extensively but, as in so many other aspects of this discussion, the relationship tends to be the opposite of the expected one: To the extent that test anxiety affects performance at all, it seems to help slightly. Only a few studies have specifically addressed black-white differences in test anxiety; they have shown either nonsignificant results, or that the white subjects were slightly more anxious than the black subjects.22
“BLACK ENGLISH.” Language looms larger. It is well established that the students from many different cultural backgrounds for whom English is a second language tend to score better on the nonverbal part of the test than a verbal component given in English.23 Whereas this imbalance may be independent of language for East Asians (Japanese in Japan have superior nonverbal scores even taking verbal test batteries designed in Japanese), it is also manifest among Latinos, who do not otherwise exhibit the characteristic East Asian verbal-nonverbal pattern. This suggests that students who are taking the test in a second language suffer some decrement of their scores.
It has been a small step from this to hypothesize that, for practical purposes, many blacks are taking the test in a “second language,” with their first language being the dialect known as “black English,” ubiquitous in the black inner city and used to some extent by blacks of broader socioeconomic backgrounds. Researchers have approached the issue in several ways. First, the evidence indicates that black children who use black English understand standard English at least as well.24 A more direct test came in the 1970s, when L. C. Quay had the Stanford-Binet translated into black dialect and tested several samples with both the original and the revised version. The studies produced no evidence that black students in any of the various test groups benefited (the differences in scores from the two tests generally amounted to less than one IQ point).25 But the most powerful data suggesting that language does not explain the black-white difference is provided by the evidence for Spearman’s hypothesis presented in Chapter 13: If language were the problem, then blacks would be at the greatest disadvantage on test items that rely on a knowledge of standard English and be at the least disadvantage on test items that use no language at all. As we discuss with regard to Spearman’s hypothesis in Chapter 13, this expectation is contradicted by a large and consistent body of work. Black populations generally do relatively better on test items that are less saturated with g and relatively worse on items more saturated with g, whether the items are verbal or nonverbal.
The Continuing Debate
Allegations that standardized tests are culturally biased still appear, and presumably this account will fuel additional ones. What about all the articles appearing in many quarters making these claims? They make up a varied lot, but typically consist of allegations that ignore the data. A particularly striking example was a long article entitled “IQ and Standard English,” which appeared in a technical journal and attributed the black-white IQ test differences to language difficulties. The article was followed by four responses, plus by a counterstatement by the author. Neither the original article nor any of the responses cited any of the data discussed above.26 The debate was carried on entirely on the basis of argumentation about the extent to
which black culture is more orally based than white culture. This readiness to theorize about what might be true about black-white differences in test scores while ignoring the pertinent data is common.
Other articles, cited in the note, have discussed a variety of ways in which culture interacts with human functioning, intellectual and otherwise.27 The movement surrounding Howard Gardner’s concept of multiple intelligences (see the Introduction) is only the best known of these new ways of talking about intelligence. But these discussions do not try to argue with the two core statements that we have made: In the major standardized tests, test items function in the same way for both blacks and whites, and the tests results are similarly predictive for blacks and whites, tending to overpredict black performance rather than underpredict it.
In the popular media, the persistence of belief in cultural bias, we think, is based on a misapprehension. To many people, proof that tests are unbiased seems tantamount to proof that the black-white gap reflects genetic differences in intelligence. Since they reject the possibility that genetic differences could be involved, the tests must be biased. One of the major purposes of Chapter 13 is to discredit both the notion that real differences in intelligence must be genetically founded and the assumption that a role for genes must have horrific consequences.
IS THE BLACK-WHITE DIFFERENCE IN COGNITIVE ABILITY SHRINKING?
The text discusses the evidence for converging black and white test scores on the NAEP (National Assessment of Education Progress) and the SAT Here, we summarize other sources of data about the two ethnic populations.
National High School Studies, 1972 and 1980
In 1972 and 1980, the federal government sponsored large-sample studies intended to provide reliable national estimates of the high school population. As part of both studies, tests measuring vocabulary, reading, and mathematics were administered to all participants. Although not technically IQ tests, all three had high g loadings. Furthermore, the tests were virtually identical for the two test administrations,28 and the study procedures in 1980 were deliberately constructed to maximize the comparability of the two samples. In 1982, the sophomores from the 1980 sample were tested as seniors. The table below summarizes the results for the three test years by ethnic group. The black-white difference diminished on two of the three tests, but all of the shrinkage came about because white scores fell, not because black scores rose. Indeed, black scores also fell on all three tests but (except in the case of vocabulary), by less than the reduction in white scores.
Black-White Difference for High School Seniors in 1972, 1980, and 1982
White-Black Difference, in SDs
1972 1980 1982
Vocabulary 1.00 .87 1.02
Reading .99 .85 .78
Math 1.09 .91 .86
Source: Rock et al. 1985, Appendixes B,C, E.
CollegeBoard Achievement Tests
THE SAT In Chapter 13, we noted that the overall black-white gap in SAT scores had narrowed between 1976 and 1993, from 1.16 to .88 standard deviation in the verbal portion of the test and from 1.27 to .92 standard deviation in the mathematics portion of the test.29 More detailed breakdowns are available for the period 1980 to 1991, as shown in the table below. The trend is consistently positive, with narrowing black-white differences of at least .1 standard deviation units on the tests for Literature, European History, Math II, Physics, French, German, Latin, and Spanish. The average shrinkage of the gap is .05 standard deviation unit. From further analyses, we conclude that the narrowing is not entirely explained away by changes in the representativeness of the black and white samples of test takers or by declining white scores.
Reductions in the Black-White Difference on the Scholastic Aptitude and Achievement Tests, 1980-1991
White-Black Difference, in SDs
1980 1991 Change
Source: The College Board’s annual summaries of test scores by ethnicity.
SAT-Verbal 1.09 .87 -.22
Reading subscore .93 .83 -.10
Vocabulary subscore 1.09 .83 -.26
SAT-Math 1.10 .90 -.20
Test of standard written English 1.11 .89 -.22
Achievement tests
Overall average .83 .78 -.05
English Composition .73 .71 -.02
Literature .86 .76 -.10
American History .69 .69 .00
European History .81 .56 -.25
Math I .75 .75 .00
Math II .98 .83 -.15
Biology .77 .68 -.09
Chemistry .69 . 74 +.05
Physics .84 .74 -.10
French .33 .18 -.15
German .64 .27 -.37
Latin .66 .25 -.41
Spanish .50 .35 -.15
To interpret the changes in scores on achievement tests, which are taken by small proportions of the SAT test takers, we used the mean that the College Board provides on the SAT Verbal and Math scores for each achievement test population in each year. The question we asked was: For a given achievement test, how did the place of the average test taker on his race’s cognitive ability distribution change from 1980 to 1991? For example, the average white taking the Literature achievement test in 1980 had an SAT Verbal score that put him at the 80th percentile of white testees; in 1991, he was at the 85th percentile. Meanwhile, the average black taking the Literature achievement test in 1980 had an SAT Verbal score that put him at the 88th percentile of all black SAT testees; in 1991, he was still at the 88th percentile of the black distribution. The difference between blacks and whites on the Literature achievement test narrowed during that period, but, given where the blacks and whites were relative to the white and black SAT distributions, it seems unlikely that the narrowing was caused by changes in the self-selection that artificially raised black scores relative to whites. Ten of the thirteen achievement tests fit this pattern. In only three cases (European History, Physics, and German) did changes in the SAT Math or Verbal scores indicate that the black pool had become differentially more selective. Only in the case of German was this difference large enough to account plausibly for much of the black improvement relative to whites.
THE ACT. The College Board’s major competitor in the college entrance examination business is the American College Testing program, which has also shown decreasing differences between black and white students who take the test, as summarized in the table below. Reductions in the gap occurred in all the subtests between 1970 and 1991, with by far the largest reduction on the English subtest. The magnitude of the overall change in the composite is about half the size of the reduction observed in the black-white difference on the SAX Like the SAT population, the ACT’s population of black test takers has been increasing, suggesting that the increases in scores are not the result of a more selective test-taking population.
Black-White Difference in the ACT, 1970-1991
White-Black Difference, in SDs
1970 1991 Change
Source: ACT 1991, Tables 1, 4; Congressional Budget Office 1986, Fig. E-2.
English 1.14 .83 -.31
Math .86 .77 -.09
Science .97 .91 -.06
Composite 1.12 .96 -.16
THE GRADUATE RECORD EXAMINATION (GRE). The GRE is the equivalent of the SAT for admission to graduate school in the arts and sciences. Not many people in any cohort take the GRE, so the sample is obviously highly self-selected and atypical of the population. In 1988, for example, the number of white GRE test takers represented only 5.6 percent of the 22-year-old white population; black test takers represented 2.3 percent of its 22-year-old population. On the other hand, the proportions in 1988 were about the same as they were in 1979. The self-selection process has remained fairly steady over the years, so it is worth at least mentioning the results, as shown in the table below. The GRE gap narrowed only slightly less than that for the SAT. Another positive note is that the narrowing was achieved because black scores rose more than white scores, not because white scores were falling.
Black-White Difference in the GRE, 197
9-1988
White-Black Difference, in SDs
1979 1988 Change
Source: Graduate Record Examination Board.
Verbal 1.25 1.13 -.12
Math 1.28 1.13 -.15
Analytical 1.46 1.21 -.25
These results from national tests are echoed in state-level data from Texas and North Carolina, as reported in the Congressional Budget Office’s survey of trends in educational achievement.30 Overall, the evidence seems clear beyond a reasonable doubt: On college entrance tests and national tests of educational proficiency, the gap between whites and blacks remained large into the early 1990s, but it had been narrowing in the preceding decade or two. The optimist may argue that the trend will continue indefinitely if improvements in the environment and education for American blacks can be continued. The pessimist may note that there seems to have been little narrowing since the mid-1980s, as we observed in the text for Chapter 13, and that the black-white IQ gap in the NLSY seems to be widening rather narrowing in the next generation, as we discussed in Chapter 15.