The Bell Curve: Intelligence and Class Structure in American Life

Home > Other > The Bell Curve: Intelligence and Class Structure in American Life > Page 79
The Bell Curve: Intelligence and Class Structure in American Life Page 79

by Richard J. Herrnstein


  16 Terman and Oden 1947.

  17 The NLSY cannot answer that question, because even a sample of 11,878 (the number that took the AFQT) is too small to yield adequate sample sizes for analyzing subgroups in the top tenth of the top percentile.

  18 There are not that many people with IQs of 120+ left over, after the known concentrations of them in the high IQ occupations are taken into account.

  19 The literature is extensive. The studies used for this discussion, in addition to those cited specifically, include Bendix 1949; Macmahon and Millett 1939; Pierson 1969; Stanley, Mann, and Doig 1967; Sturdivant and Adler 1976; Vanee 1966; Warner and Abegglen 1955.

  20 Newcomer 1955, Table 24, p. 68.

  21 Clews 1908, pp. 27, 37, quoted in Newcomer 1955, p. 66.

  22 The data are drawn from Newcomer 1955.

  23 Burck 1976. The Fortune survey was designed to yield data comparable with those in Newcomer 1955.

  24 The ostensible decline in college degrees after 1950 is explained by college graduates’ going on to get additional educational credentials. For another study of educational attainment of CEOs that shows the same pattern, see Priest 1982.

  25 U.S. Bureau of the Census 1992, Tables 18, 615, and U.S. Department of Labor 1991, Table 22.

  26 Excluding accountants, who were already counted in the high-IQ professions.

  27 Matarazzo 1972, Table 7.3, p. 178.

  Chapter 3

  1 Bok 1985b. In another setting, again discussing the SAT, he wrote, “Such tests are only modestly correlated with subsequent academic success and give no reliable indication of achievement in later life” (Bok 1985a, p. 15).

  2 The correlation of IQ with income in a restricted population such as Harvard graduates could be negative when people toward the top of the IQ distribution are disproportionately drawn into academia, where they make a decent living but seldom much more than that, while students with IQs of “only” 120 and 130 will more often go into the business world, where they may get rich.

  3 See Chapter 19; Dunnette 1976; Ghiselli 1973.

  4 Technically, a correlation coefficient is a ratio, with the covariation of the two variables in the numerator and the product of the separate standard deviations of the two variables in the denominator. The formula for computing a Pearson product moment correlation r (the kind that we will be using throughout) is: where X and Y refer to the actual values for each case and X and Y refer to the mean values of the X and Y, respectively.

  5 We limited the sample to families making less than $100,000, so as to avoid some distracting technical issues that arise when analyzing income across the entire spectrum (e.g., the appropriateness of using logged values rather than raw values). The results from the 1 percent sample are in line with the statistics produced when the analysis is repeated for the entire national sample: a correlation of .31 and an increment of $2,700 per year of additional education. Income data are for 1989, expressed in 1990 dollars.

  6 An important distinction: The underlying relationship persists in a sample with restricted range, but the restriction of range makes the relationship harder to identify (i.e., the correlation coefficient is attenuated, sometimes to near zero).

  Forgetting about restriction of range produces fallacious reasoning that is remarkably common, even among academics who are presumably familiar with the problem. For example, psychologist David McClelland, writing at the height of the anti-IQ era in 1973, argued against any relationship between career success and IQ, pointing out that whereas college graduates got better jobs than nongraduates, the academic records of graduates did not correlate with job success, even though college grades correlate with IQ. He added, anecdotally, that he recalled his own college class—Wesleyan University, a top-rated small college—and was convinced that the eight best and eight worst students in his class had not done much differently in their subsequent careers (McClelland 1973). This kind of argument is also common in everyday life, as in the advice offered by friends during the course of writing this book. There was, for example, our friend the nuclear physicist, who prefaced his remarks by saying, “I don’t think I’m any smarter than the average nuclear physicist …” Or an engineer friend, a key figure in the Apollo lunar landing program, who insisted that this IQ business is much overemphasized. He had been a C student in college and would not have even graduated, except that he managed to pull himself together in his senior year. His conclusion was that motivation was important, not IQ. Did he happen to know what his IQ was? Sure, he replied. It was 146. He was right, insofar as motivation can make the difference between being a first-rate rocket scientist and a mediocre one—if you start with an IQ of 146. But the population with a score of 146 (or above) represents something less than 0.2 percent of the population. Similarly, correlations of IQ and job success among college graduates suffer from restriction of range. The more selective the group is, the greater the restriction, which is why Derek Bok may plausibly (if not quite accurately) have claimed that SAT scores have “no correlation at all with what you do in the rest of your life” if he was talking about Harvard students.

  7 E.g., Fallows 1985.

  8 See Chapter 20 for more detail.

  9 Griggs v. Duke Power, 401 U.S. 424 (1971).

  10 The doctrine has been built into the U.S. Employment and Training Service’s General Aptitude Test Battery (GATB), into the federal civil service’s Professional and Administrative Career Examination (PACE), and into the military’s Armed Services Vocational Aptitude Battery (ASVAB). Bartholet 1982; Braun 1992; Gifford 1989; Kelman 1991; Seymour 1988. For a survey of test instruments and their use, see Friedman and Williams 1982.

  11 For a recent review of the expert community as a whole, see Schmidt and Ones 1992.

  12 Hartigan and Wigdor 1989 and Schmidt and Hunter 1991 represent the two ends of the range of expert opinion.

  13 For a sampling of the new methods, see Bangert-Drowns 1986; Glass 1976; Glass, McGaw, and Smith 1981; Hunter and Schmidt 1990. Meta-analytic strategies had been tried for decades prior to the 1970s, but it was after the advent of powerful computers and statistical software that many of the techniques became practicable.

  14 Hartigan and Wigdor 1989; Hunter and Schmidt 1990; Schmidt and Hunter 1981.

  15 We have used the terms job productivity or job performance or performance ratings without explaining what they mean or how they are measured. On the other hand, all of us have a sense of what job productivity is like—we are confident that we know who are the better and worse secretaries, managers, and colleagues among those with whom we work closely. But how is this knowledge to be captured in objective measures? Ratings by supervisors or peers? Samples of work in the various tasks that a job demands? Tests of job knowledge? Job tenure or promotion? Direct cost accounting of workers’ output? There is no way to answer such a question decisively, for people may legitimately disagree about what it is about a worker’s performance that is most worth predicting. As a practical matter, ratings by supervisors, being the most readily obtained and the least intrusive in the workplace, have dominated the literature (Hunter 1986). But it is natural to wonder whether supervisor ratings, besides being easy to get, truly measure how well workers perform rather than, say, how they get along with the boss or how they look (Guion 1983).

  To get a better fix on what the various measures of performance mean, it is useful to evaluate a number of studies that have included measures of cognitive ability, supervisor ratings, samples of work, and tests of job knowledge. Work samples are usually obtained by setting up stations for workers to do the various tasks required by their jobs and having their work evaluated in some reasonably objective way. Different occupations lend themselves more or less plausibly to this kind of simulated performance. The same is true of written or oral tests of job knowledge.

  One of the field’s leaders, John Hunter, has examined the correlational structure that relates these different ways of looking at job performance to each other and to an intelligence test scor
e (Hunter 1983, 1986). In a study of 1,800 workers, Hunter found a strong direct link between intelligence and job knowledge and a much smaller direct one between intelligence and performance in work sample tasks. By direct we mean that the variables predict each other without taking any other variable into account. The small direct link between intelligence and work sample was augmented by a large indirect link, via job knowledge: a person’s intelligence predicted his knowledge of the job, and his knowledge in turn predicted his work sample. The correlation (after the usual statistical corrections) between intelligence and job knowledge was .8; between intelligence and work sample it was .75. The indirect link between intelligence and work sample, via job knowledge, was larger by half than the direct one (Hunter 1986).

  The correlation between intelligence and supervisor ratings in Hunter’s analysis was .47. Upon analysis, Hunter found that the primary reason is that brighter workers know more about their jobs, and supervisors respond favorably to their knowledge. A comparable analysis of approximately 1,500 military personnel in four specialties produced the same basic finding (Hunter 1986). This may seem a weakness of the supervisor rating measure, but is it really? How much workers know about their jobs correlates, on the one hand, with their intelligence and, on the other, with both how they do on direct tests of their work and how they are rated by their supervisors. A worker’s intelligence influences how much he learns about the job, and job knowledge contributes to proficiency. The knowledge also influences the impression the worker makes on a supervisor rating more than the work as measured by a work sample test (which, of course, the supervisor may never see in the ordinary course of business). Using supervisor rating as a measure of proficiency is thereby justified, without having to claim that the rating directly measures proficiency.

  Hunter found that work samples are more dependent on intelligence and job knowledge than are supervisor ratings. Supervisor ratings, which are so predominant in this literature, may, in other words, underestimate how important intelligence is for proficiency. Recent research suggests that supervisor ratings in fact do underestimate the correlation between intelligence and productivity (Becker and Huselid 1992). But we should acknowledge again that none of the measures of proficiency—work samples, supervisor ratings, or job knowledge tests—is free of the taint of artificiality, let alone arbitrariness. Supervisor ratings may be biased in many ways; a test of job knowledge is a test, not a job; and even a worker going from one work station to another under the watchful eye of an industrial psychologist may be revealing something other than everyday competence. It has been suggested that the various contrived measures of workers tell us more about maximum performance than they do about typical, day-to-day proficiency (Guion 1983). We therefore advise that the quantitative estimates we present here (or that can be found in the technical literature at large) be considered only tentative and suggestive.

  16 The average validity of .4 is obtained after standard statistical corrections of various sorts. The two most important of these are a correction for test unreliability or measurement error and a correction for restriction of range among the workers in any occupation. All of the validities in this section of the chapter are similarly corrected, unless otherwise noted.

  17 Ghiselli 1966, 1973; Hunter and Hunter 1984, Table 1.

  18 Hunter 1980; Hunter and Hunter 1984.

  19 Where available, ratings by peers, tests of job knowledge, and actual work samples often come close to ability measures as predictors of job performance (Hunter and Hunter 1984). But aptitude tests have the practical advantage that they can be administered relatively inexpensively to large numbers of applicants, and they do not depend on applicants’ having been on the job for any length of time.

  20 E. F. Wonderlic & Associates 1983; Hunter 1989. These validities, which are even higher than the ones presented in the table on page 74 are for training success rather than for measures of job performance and are more directly comparable with the column for training success in the GATB studies than the column for job proficiency. Regarding job performance, one major study evaluated the performance of about 1,500 air force enlisted men and women working in eight military specialties, chosen to be representative of military specialties in the air force. Performance was variously measured: by defining a set of tasks involved in each job, then training a group of evaluators to assess those specific tasks; by interviews of the personnel on technical aspects of their jobs; by supervisor ratings after training the supervisors; and combinations of methods. The average correlation between AFQT score and a hands-on job performance measure was .40, with the highest among the precision measurement equipment specialists and the avionics communications specialists and the lowest among the air traffic control operators and the air crew life support specialists. Insofar as the jobs were restricted to those held by enlisted men, the distribution of jobs was somewhat skewed toward the lower end of the skill range. We do not have an available estimate of the validity of the AFQT over all military jobs.

  21 Hartigan and Wigdor 1989.

  22 It is one of the chronically frustrating experiences when reading scientific results: Two sets of experts, supposedly using comparable data, come out with markedly different conclusions, and the reasons for the differences are buried in technical and opaque language. How is it possible for a layperson to decide who is right? The different estimates of mean validity of the GATB—.45 according to Hunter, Schmidt, and some others; .25 according to the Hartigan committee—is an instructive case in point.

  Sometimes the differences really are technical and opaque. For example, the Hartigan committee based its estimate on the assumption that the reliability of supervisor ratings was higher than other studies assumed—.8 instead of .6 (Hartigan and Wigdor 1989, p. 170). By assuming a higher reliability, the committee’s correction for measurement error was smaller than Hunter’s. Deciding between the Hartigan committee’s use of .8 as the reliability of supervisor ratings instead of the .6 used by Hunter is impossible for anyone who is not intimately familiar with a large and scattered literature on that topic, and even then the choice remains a matter of judgment. But the Hartigan committee’s decision not to correct for restriction of range, which makes the largest difference in their estimates of the overall validity, is based on a much different kind of disagreement. Here, a layperson is as qualified to decide as an expert, for this is a disagreement about what question is being answered.

  John Hunter and others assumed that for any job the applicant pool is the entire U.S. work force. That is, they sought an answer to the question, “What is the relationship between job performance and intelligence for the work force at large?” The Hartigan committee objected to their assumption on grounds that, in practice, the applicant pool for any particular job is not the entire U.S. work force but people who have a chance to get the job. As they accurately noted, “People gravitate to jobs for which they are potentially suited” (Hartigan and Wigdor 1989, p. 166).

  But embedded in the committee’s objection to Hunter’s estimates is a tacit switch in the question that the analysis is supposed to answer. The Hartigan committee sought an answer to the question, “Among those people who apply for such-and-such a position, what is the relationship between intelligence and job performance?” If one’s objective is not to discourage people who weigh only 250 pounds from applying for jobs as tackles in the NFL, to return to our analogy, then the Hartigan committee’s question is the appropriate one. Of course, by minimizing the validity of weight, a large number of 150-pound lineman may apply for the jobs. Thus our reasons for concluding that the assumption used by Hunter and Schmidt (among others), that restriction of range calculations should be based on the entire work force, is self-evidently the appropriate choice if one wants to know the overall relationship of IQ to job performance and its economic consequences.

  23 The ASVAB comprises ten subtests: General Science, Arithmetic Reasoning, Word Knowledge, Paragraph Comprehension, Numerical Operations, Coding Speed, Auto/Sh
op Information, Mathematics Knowledge, Mechanical Comprehension, and Electronics Information. Only Numerical Operations and Coding Speed are highly speeded; the other eight are nonspeeded “power” tests. All the armed services use the four MAGE composites, for Mechanical, Administrative, General, and Electronics specialties, each of which includes three or four subtests in a particular weighting. These composites are supposed to predict a recruit’s trainability for the particular specialty. The AFQT is yet another composite from the ASVAB, selected so as to measure g efficiently. See Appendix 3.

  24 About 80 percent of the sample had graduated from high school and had no further civilian schooling, fewer than 1 percent had failed to graduate from high school, and fewer than 2 percent had graduated from college; the remainder had some post-high school civilian schooling short of a college degree. The modal person in the sample was a white male between 19 and 20 years old, but the sample also included thousands of women and people from all American ethnic groups; their ages ranged from a minimum of 17 to almost 15 percent above 23 years (see Ree and Earles 199Ob). Other studies, using educationally heterogeneous samples, have in fact shown that, holding AFQT constant, high school graduates are more likely to avoid disciplinary action, to be recommended for reenlistment, and to be promoted to higher rank than nongraduates (Office of the Assistant Secretary of Defense 1980). Current enlistment policies reflect the independent predictiveness of education, in that of two applicants with equal AFQT score, the high school graduate is selected over the nongraduate if only one is to be accepted.

  25 In fact, there may be some upward bias in these correlations, inasmuch as they were not cross validated to exclude capitalization on chance.

  26 What does it mean to “account for the observed variation”? Think of it in this way: A group of recruits finishes its training course; their grades vary. How much less would they have varied had they entered the course with the same level of g? This may seem like a hypothetical question, but it is answered simply by squaring the correlation between the recruits’ level of g and their final grades. In general, given any two variables, the degree to which variation in either is explained (or accounted for, in statistical lingo) by the other variable is obtained by squaring the correlation between them. For example, a perfect correlation of 1 between two variables means that each of the variables fully explains the observed variations in the other. When two variables are perfectly correlated, they are also perfectly redundant since if we know the value of one of them, we also know the value of the other without having to measure it. Hence, 1 squared is 1.0 or 100 percent. A correlation of .5 means that each variable explains, or accounts for, 25 percent of the observed variation in the other; a correlation of 0 means that neither variable accounts for any of the observed variation in the other.

 

‹ Prev