High (setup work) .56 .65 2.5
Low (feeding/offbearing) .23 NA 2.4
The third major database bearing on this issue comes from the military, and it is in many ways the most satisfactory. The AFQT (Armed Forces Qualification Test) is extracted from the scores on several tests that everyone in the armed forces takes. It is an intelligence test, highly loaded on g. Everyone in the military goes to training schools, and everyone is measured for training success at the end of their schooling, with “training success” based on measures that directly assess job performance skills and knowledge. The job specialties in the armed forces include most of those found in the civilian world, as well a number that are not (e.g., combat). The military keeps all of these scores in personnel files and puts them on computers. The resulting database has no equal in the study of job productivity.
We will be returning to the military data for a closer look when we turn to subjects for which they are uniquely suited. For now, we will simply point out that the results from the military conform to the results in the civilian job market. The results for training success in the four major job families are shown in the table above. These results are based on results from 828 military schools and 472,539 military personnel. The average validity was .62. They hold true for individual schools as well. Even the lowest-validity school, combat, in which training success is heavily dependent on physical skills, the validity was still a substantial .45.20
The Validity of the AFQT for Military Training
Military Job Family Mean Validity of AFQT Score and Training Success
Source: Hunter 1985, Table 3.
Mechanical .62
Clerical .58
Electronic .67
General technical .62
The lowest modern estimate of validity for cognitive ability is the one contained in the report by a panel convened by the National Academy of Sciences, Fairness in Employment Testing.21 That report concluded that the mean validity is only about .25 for the GATB, in contrast to the Hunter estimate of .45 (which we cited earlier). Part of the reason was that the Hartigan committee (we name it for its chairman, Yale statistician John Hartigan), analyzing 264 studies after 1972, concluded that validities had generally dropped in the more recent studies. But the main source of the difference in validities is that the committee declined to make any correction whatsoever for restriction of range (see above and note 6). It was, in effect, looking at just the tackles already in the NFL; Hunter was considering the population at large. The Hartigan committee’s overriding concern, as the title of their report (Fairness in Employment Testing) indicates, was that tests not be used to exclude people, especially blacks, who might turn out to be satisfactory workers. Given that priority, the committee’s decision not to correct for restriction of range makes sense. But failing to correct for restriction of range produces a misleadingly low estimate of the overall relationship of IQ to job performance and its economic consequences.22 Had the Hartigan committee corrected for restriction of range; the estimates of the relationship would have been .35 to .40, not much less than Hunter’s.
THE REASONS FOR THE LINK BETWEEN COGNITIVE ABILITY AND JOB PERFORMANCE
Why are job performance and cognitive ability correlated? Surgeons, for example, will be drawn from the upper regions of the IQ distribution. But isn’t it possible that all one needs is “enough” intelligence to be a surgeon, after which “more” intelligence doesn’t make much difference? Maybe small motor skills are more important. And yet “more” intelligence always seems to be “better,” for large groups of surgeons and every other profession. What is going on that produces such a result?
Specific Skills or g?
As we begin to explore this issue, the story departs more drastically from the received wisdom. One obvious, commonsense explanation is that an IQ test indirectly measures how much somebody knows about the specifics of a job and that that specific knowledge is the relevant thing to measure. According to this logic, more general intellectual capacities are beside the point. But the logic, however commonsensical, is wrong. Surprising as it may seem, the predictive power of tests for job performance lies almost completely in their ability to measure the most general form of cognitive ability, g, and has little to do with their ability to measure aptitude or knowledge for a particular job.
SPECIFIC SKILLS VERSUS G IN THE MILITARY. The most complete data on this issue come from the armed services, with their unique advantages as an employer that trains hundreds of thousands of people for hundreds of job specialties. We begin with them and then turn to the corresponding data from the civilian sector.
In assigning recruits to training schools, the services use particular combinations of subtests from a test battery that all recruits take, the Armed Services Vocational Aptitude Battery (ASVAB).23 The Pentagon’s psychometricians have tried to determine whether there is any practical benefit of using different weightings of the subtests for different jobs rather than, say, just using the overall score for all jobs. The overall score is itself tantamount to an intelligence test. One of the most comprehensive studies of the predictive power of intelligence tests was by Malcolm Ree and James Earles, who had both the intelligence test scores and the final grades from military school for over 78,000 air force enlisted personnel spread over eighty-nine military specialties. The personnel were educationally homogeneous (overwhelmingly high school graduates without college degrees), conveniently “controlling” for educational background.24
What explains how well they performed? For every one of the eightynine military schools, the answer was g—Charles Spearman’s general intelligence. The correlations between g alone and military school grade ranged from an almost unbelievably high .90 for the course for a technical job in avionics repair down to .41 for that for a low-skill job associated with jet engine maintenance.25 Most of the correlations were above .7. Overall, g accounted for almost 60 percent of the observed variation in school grades in the average military course, once the results were corrected for range restriction (the accompanying note spells out what it means to “account for 60 percent of the observed variation”).26
Did cognitive factors other than g matter at all? The answer is that the explanatory power of g was almost thirty times greater than of all other cognitive factors in ASVAB combined. The table below gives a sampling of the results from the eighty-nine specialties, to illustrate the two commanding findings: g alone explains an extraordinary proportion of training success; “everything else” in the test battery explained very little.
The Role of g in Explaining Training Success for Various Military Specialties
Enlisted Military Skill Category Percentage of Training Success Explained by:
g Everything Else
Source: Ree and Earles 1990a, Table 9.
Nuclear weapons specialist 77.3 0.8
Air crew operations specialist 69.7 1.8
Weather specialist 68.7 2.6
Intelligence specialist 66.7 7.0
Fireman 59.7 0.6
Dental assistant 55.2 1.0
Security police 53.6 1.4
Vehicle maintenance 49.3 7.7
Maintenance 28.4 2.7
An even larger study, not quite as detailed, involving almost 350,000 men and women in 125 military specialties in all four armed services, confirmed the predominant influence of g and the relatively minor further predictive power of all the other factors extracted from ASVAB scores.27 Still another study, of almost 25,000 air force personnel in thirty-seven different military courses, similarly found that the validity of individual ASVAB subtests in predicting the final technical school grades was highly correlated with the g loading of the subtest.28
EVIDENCE FROM CIVILIAN JOBS. There is no evidence to suggest that military jobs are unique in their dependence on g. However, scholars in the civilian sector are at a disadvantage to their military colleagues; nothing approaches the military’s database on this topic. In one of the few major studies involving civilian jobs, performa
nce in twenty-eight occupations correlated virtually as well with an estimate of g from GATB scores as it did with the most predictively weighted individual sub test scores in the battery.29 The author concluded that, for samples in the range of 100 to 200, a single factor, g, predicts job performance as well as, or better than, batteries of weighted subtest scores. With larger samples, for which it is possible to pick up the effect of less potent influences, there may be some modest extra benefit of specialized weighted scores. At no level of sampling, however, does g become anything less than the best single predictor known, across the occupational spectrum. Perhaps the most surprising finding has been that tests of general intelligence often do better in predicting future job performance than do contrived tests of job performance itself. Attempts to devise measures that are specifically keyed to a job’s tasks—for example, tests of filing, typing, answering the telephone, searching in records, and the like for an office worker—often yield low-validity tests, unless they happen to measure g, such as a vocabulary test. Given how pervasive g is, it is almost impossible to miss it entirely with any test, but some tests are far more efficient measures of it than others.30
Behind the Test Scores
Let us try to put these data in the framework of everyday experience. Why should it be that variation in general cognitive ability, g, is more important than job-specific skills and knowledge? We will use the job of busboy as a specific example, asking the question: At a run-of-the-mill family restaurant, what distinguishes a really good busboy from an average one?
Being a busboy is a straightforward job. The waiter takes the orders, deals with the kitchen, and serves the food while the busboy totes the dirty dishes out to the kitchen, keeps the water glasses filled, and helps the waiter serve or clear as required. In such a job, a high IQ is not required. One may be a good busboy simply with diligence and good spirits. But complications arise. A busboy usually works with more than one waiter. The restaurant gets crowded. A dozen things are happening at once. The busboy is suddenly faced with queuing problems, with setting priorities. A really good busboy gets the key station cleared in the nick of time, remembering that a table of new orders near that particular station is going to be coming out of the kitchen; when he goes to the kitchen, he gets a fresh water pitcher and a fresh condiment tray to save an extra trip. He knows which waiters appreciate extra help and when they need it. The point is one that should draw broad agreement from readers who have held menial jobs: Given the other necessary qualities of diligence and good spirits, intelligence helps. The really good busboy is engaged in using g when he is solving the problems of his job, and the more g he has, the more quickly he comes up with the solutions and can call on them when appropriate.
Now imagine devising a test that would enable an employer to choose the best busboy among applicants. One important aspect of the test would measure diligence and good spirits. Perhaps the employer should weigh the results of this part of the test more heavily than anything else, if his choice is between a diligent and cheerful applicant and a slightly smarter but sulky one. But when it comes to measuring performance in general for most applicants, it is easy to see why the results will match the findings of the literature we just discussed. Job-specific items reveal mostly whether an applicant has ever been a busboy before. But that makes very little difference to job productivity, because a bright person can pick up the basic routine in the course of a few shifts. The g-loaded items, on the other hand, will reveal whether the applicant will ever become the kind of busboy who will clear table 12 before he clears table 20 because he relates the needed task to something that happened twenty minutes earlier regarding table 15. And that is why employers who want to select productive busboys should give applicants a test of general intelligence rather than a test of busboy skills. The kind of test that would pass muster with the courts—a test of job-specific skills—is a less effective kind of test to administer. What applies to busboys applies ever more powerfully as the jobs become more complex.
DOES MORE EXPERIENCE MAKE UP FOR LESS INTELLIGENCE?
The busboy example leads to another question that bears on how we should think about cognitive ability and job productivity: How much can experience counterbalance ability? Yes, the smart busboy will be more productive than the less-smart busboy a week into the job, and, yes, perhaps there will always be a few things that the smart busboy can do that the less smart cannot. But will the initial gap in productivity narrow as the less-smart busboy gains experience? How much, and how quickly?
Separately, job performance relates to both experience and intelligence, but the relationships differ.31 That is, people who are new to a job learn quickly at first, then more slowly. A busboy who has, say, one month on the job may for that reason outperform someone who started today, but the one-month difference in experience will have ceased to matter in six months. No comparable leveling-off effect has been observed for increasing intelligence. Wherever on the scale of intelligence pairs of applicants are, the smarter ones not only will outperform the others, on the average, but the benefit of having a score that is higher by a given amount is approximately the same throughout the range. Or, to put it more conservatively, no one has produced good evidence of diminishing returns to intelligence.32
But what happens when both factors are considered jointly? Do employees of differing intelligence converge after some time on the job? If the answer were yes, then it could be argued that hiring less intelligent people imposes only a limited and passing cost. But the answer seems to be closer to no than to yes, although much remains to be learned.
Some convergence has been found when SATs are used as the measure of ability and grade point average is used as the measure of achievement.33 Students with differing SATs sometimes differ more in their freshman grades than in later years. That is why President Bok granted predictive value to the SAT only for first-year grades.34 On the other hand, the shrinking predictive power may be because students learn which courses they are likely to do well in: They drop out of physics or third-year calculus, for example, and switch to easier courses. They find out which professors are stingy with A’s and B’s. At the U.S. Military Academy, where students have very little choice in courses, there is no convergence in grades.35
When it comes to job performance, the balance of the evidence is that convergence either does not occur or that the degree of convergence is small. This was the finding of a study of over 23,000 civilian employees at three levels of mental ability (high, medium, and low), using supervisor ratings as the measure of performance, and it extended out to job tenures of twenty years and more.36 A study of four military specialties (armor repairman, armor crewman, supply specialist, cook) extending out to five years of experience and using three different measures of job performance (supervisor’s ratings, work sample, and job knowledge) found no reliable evidence of convergence.37 Still another military study, which examined several hundred marines working as radio repairmen, automotive mechanics, and riflemen, found no convergence among personnel of differing intelligence when job knowledge was the measure of performance but did find almost complete convergence after a year or so when a work sample was the measure.38
Other studies convey a similarly mixed picture.39 Some experts are at this point concluding that convergence is uncommon in the ordinary range of jobs.40 It may be said conservatively that for most jobs, based on most measures of productivity, the difference in productivity associated with differences in intelligence diminishes only slowly and partially. Often it does not diminish at all. The cost of hiring less intelligent workers may last as long as they stay on the job.
TEST SCORES COMPARED TO OTHER PREDICTORS OF PRODUCTIVITY
How good a predictor of job productivity is a cognitive test score compared to a job interview? Reference checks? College transcript? The answer, probably surprising to many, is that the test score is a better predictor of job performance than any other single measure. This is the conclusion to be drawn from a meta-analysis on t
he different predictors of job performance, as shown in the table below.
The Validity of Some Different Predictors of Job Performance
Predictor Validity Predicting Job Performance Ratings
Source: Hunter and Hunter 1984.
Cognitive test score .53
Biographical data .37
Reference checks .26
Education .22
Interview .14
College grades .11
Interest .10
Age −.01
The data used for this analysis were top heavy with higher-complexity jobs, yielding a higher-than-usual validity of .53 for test scores. However, even if we were to substitute the more conservative validity estimate of .4, the test score would remain the best predictor, though with close competition from biographical data.41 The method that many people intuitively expect to be the most accurate, the job interview, has a poor record as a predictor of job performance, with a validity of only .14.
Readers who are absolutely sure nonetheless that they should trust their own assessment of people rather than a test score should pause to consider what this conclusion means. It is not that you would select a markedly different set of people through interviews than test scores would lead you to select. Many of the decisions would be the same. The results in the table say, in effect, that among those choices that would be different, the employees chosen on the basis of test scores will on average be more productive than the employees chosen on the basis of any other single item of information.
THE DIFFERENCE INTELLIGENCE MAKES
We arrive finally at the question of what it all means. How important is the overall correlation of .4, which we are using as our benchmark for the relation between intelligence and job performance? The temptation may be to say, not very. As we showed before, there will be many exceptions to the predicted productivity with correlations this modest. And indeed it is not very important when an employer needs just a few new employees for low-complexity jobs and is choosing among a small group of job applicants who have small differences in test scores. But the more reality departs from this scenario, the more important cognitive ability becomes.
The Bell Curve: Intelligence and Class Structure in American Life Page 11