The Bell Curve: Intelligence and Class Structure in American Life

Home > Other > The Bell Curve: Intelligence and Class Structure in American Life > Page 3
The Bell Curve: Intelligence and Class Structure in American Life Page 3

by Richard J. Herrnstein


  Working from these observations, Galton tried to devise an intelligence test as we understand the term today: a set of items probing intellectual capacities that could be graded objectively. Galton had the idea that intelligence would surface in the form of sensitivity of perceptions, so he constructed tests that relied on measures of acuity of sight and hearing, sensitivity to slight pressures on the skin, and speed of reaction to simple stimuli. His tests failed, but others followed where Galton had led. His most influential immediate successor, a French psychologist, Alfred Binet, soon developed questions that attempted to measure intelligence by measuring a person’s ability to reason, draw analogies, and identify patterns.3 These tests, crude as they were by modern standards, met the key criterion that Galton’s tests could not: Their results generally accorded with common understandings of high and low intelligence.

  By the end of the nineteenth century, mental tests in a form that we would recognize today were already in use throughout the British Commonwealth, the United States, much of continental Europe, and Japan.4 Then, in 1904, a former British Army officer named Charles Spearman made a conceptual and statistical breakthrough that has shaped both the development and much of the methodological controversy about mental tests ever since.5

  By that time, considerable progress had been made in statistics. Unlike Galton in his early years, investigators in the early twentieth century had available to them an invaluable number, the correlation coefficient first devised by Galton himself in 1888 and elaborated by his disciple, Karl Pearson.6 Before the correlation coefficient was available, scientists could observe that two variables, such as height and weight, seemed to vary together (the taller the heavier, by and large), but they had no way of saying exactly how much they were related. With Pearson’s r, as the coefficient was labeled, they now could specify “how much” of a relationship existed, on a scale ranging from a minimum of −1 (for perfectly inverse relationships) to +1 (for perfectly direct relationships).

  Spearman noted that as the data from many different mental tests were accumulating, a curious result kept turning up: If the same group of people took two different mental tests, anyone who did well (or poorly) on one test tended to do similarly well (or poorly) on the other. In statistical terms, the scores on the two tests were positively correlated. This outcome did not seem to depend on the specific content of the tests. As long as the tests involved cognitive skills of one sort or another, the positive correlations appeared. Furthermore, individual items within tests showed positive correlations as well. If there was any correlation at all between a pair of items, a person who got one of them right tended to get the other one right, and vice versa for those who got it wrong. In fact, the pattern was stronger than that. It turned out to be nearly impossible to devise items that plausibly measured some cognitive skill and were not positively correlated with other items that plausibly measured some cognitive skill, however disparate the pair of skills might appear to be.

  The size of the positive correlations among the pairs of items in a test did vary a lot, however, and it was this combination—positive correlations throughout the correlation matrix, but of varying magnitudes—that inspired Spearman’s insight.7 Why are almost all the correlations positive? Spearman asked. Because, he answered, they are tapping into the same general trait. Why are the magnitudes different? Because some items are more closely related to this general trait than others.8

  Spearman’s statistical method, an early example of what has since become known as factor analysis, is complex, and we will explore some of those complexities. But, for now, the basis for factor analysis can be readily understood. Insofar as two items tap into the same trait, they share something in common. Spearman developed a method for estimating how much sharing was going on in a given set of data. From almost any such collection of mental or academic test scores, Spearman’s method of analysis uncovered evidence for a unitary mental factor, which he named g, for “general intelligence.” The evidence for a general factor in intelligence was pervasive but circumstantial, based on statistical analysis rather than direct observation. Its reality therefore was, and remains, arguable.

  Spearman then made another major contribution to the study of intelligence by defining what this mysterious g represented. He hypothesized that g is a general capacity for inferring and applying relationships drawn from experience. Being able to grasp, for example, the relationship between a pair of words like harvest and yield, or to recite a list of digits in reverse order, or to see what a geometrical pattern would look like upside down, are examples of tasks (and of test items) that draw on g as Spearman conceived of it. This definition of intelligence differed subtly from the more prevalent idea that intelligence is the ability to learn and to generalize what is learned. The course of learning is affected by intelligence, in Spearman’s view, but it was not the thing in itself. Spearmanian intelligence was a measure of a person’s capacity for complex mental work.

  Meanwhile, other testers in Europe and America continued to refine mental measurement. By 1908, the concept of mental level (later called mental age) had been developed, followed in a few years by a slightly more sophisticated concept, the intelligence quotient. IQ at first was just a way of expressing a person’s (usually a child’s) mental level relative to his or her contemporaries. Later, as the uses of testing spread, IQ became a more general way to express a person’s intellectual performance relative to a given population. Already by 1917, soon after the concept of IQ was first defined, the U.S. Army was administering intelligence tests to classify and assign recruits for World War I. Within a few years, the letters “IQ” had entered the American vernacular, where they remain today as a universally understood synonym for intelligence.

  To this point, the study of cognitive abilities was a success story, representing one of the rare instances in which the new soft sciences were able to do their work with a rigor not too far short of the standards of the traditional sciences. A new specialty within psychology was created, psychometrics. Although the debates among the psychometricians were often fierce and protracted, they produced an expanded understanding of what was involved in mental capacity. The concept of g survived, embedded in an increasingly complex theory of the structure of cognitive abilities.

  Because intelligence tests purported to test rigorously an important and valued trait about people (including ourselves and our loved ones), IQ also became one of the most visible and controversial products of social science. The first wave of public controversy occurred during the first decades of the century, when a few testing enthusiasts proposed using the results of mental tests to support outrageous racial policies. Sterilization laws were passed in sixteen American states between 1907 and 1917, with the elimination of mental retardation being one of the prime targets of the public policy. “Three generations of imbeciles are enough,” Justice Oliver Wendell Holmes declared in an opinion upholding the constitutionality of such a law.9 It was a statement made possible, perhaps encouraged, by the new enthusiasm for mental testing.

  In the early 1920s, the chairman of the House Committee on Immigration and Naturalization appointed an “Expert Eugenical Agent” for his committee’s work, a biologist who was especially concerned about keeping up the American level of intelligence by suitable immigration policies.10 An assistant professor of psychology at Princeton, Carl C. Brigham, wrote a book entitled A Study of American Intelligence using the results of the U.S. Army’s World War I mental testing program to conclude that an influx of immigrants from southern and eastern Europe would lower native American intelligence, and that immigration therefore should be restricted to Nordic stock (see the box about tests and immigration).11

  Fact and Fiction About Immigration and Intelligence Testing

  Two stories about early IQ testing have entered the folklore so thoroughly that people who know almost nothing else about that history bring them up at the beginning of almost any discussion of IQ. The first story is that Jews and other immigrant groups
were thought to be below average in intelligence, even feebleminded, which goes to show how untrustworthy such tests (and the testers) are. The other story is that IQ tests were used as the basis for the racist immigration policies of the 1920s, which shows how dangerous such tests (and the testers) are.12

  The first is based on the work done at Ellis Island by H. H. Goddard, who explicitly preselected his sample for evidence of low intelligence (his purpose was to test his test’s usefulness in screening for feeblemindedness), and did not try to draw any conclusions about the general distribution of intelligence in immigrant groups.13 The second has a stronger circumstantial case: Brigham published his book just a year before Congress passed the Immigration Restriction Act of 1924, which did indeed tip the flow of immigrants toward the western and northern Europeans. The difficulty with making the causal case is that a close reading of the hearings for the bill shows no evidence that Brigham’s book in particular or IQ tests in general played any role.14

  Critics responded vocally. Young Walter Lippmann, already an influential columnist, was one of the most prominent, fearing power-hungry intelligence testers who yearned to “occupy a position of power which no intellectual has held since the collapse of theocracy,”15 In a lengthy exchange in the New Republic in 1922 and 1923 with Lewis Terman, premier American tester of the time and the developer of the Stanford-Binet IQ test, Lippmann wrote, “I hate the impudence of a claim that in fifty minutes you can judge and classify a human being’s predestined fitness in life. I hate the pretentiousness of that claim. I hate the abuse of scientific method which it involves. I hate the sense of superiority which it creates, and the sense of inferiority which it imposes.”16

  Lippmann’s characterization of the tests and the testers was sometimes unfair and often factually wrong, as Terman energetically pointed out.17 But while Terman may have won the technical arguments, Lippmann was right to worry that many people were eager to find connections between the results of testing and the more chilling implications of social Darwinism. Even if the psychometricians generally made modest claims for how much the tests predicted, it remained true that “IQr”—that single number with the memorable label—was seductive. As Lippmann feared, people did tend to give more credence to an individual’s specific IQ score and make broader generalizations from it than was appropriate. And not least, there was plenty to criticize in the psychometricians’ results. The methods for collecting and analyzing quantitative psychological data were still new, and some basic inferential mistakes were made.

  If the tests had been fatally flawed or merely uninformative, they would have vanished. Why this did not happen is one of the stories we will be telling, but we may anticipate by observing that the use of tests endured and grew because society’s largest institutions—schools, military forces, industries, governments—depend significantly on measurable individual differences. Much as some observers wished it were not true, there is often a need to assess differences between people as objectively, fairly, and efficiently as possible, and even the early mental tests often did a better job of it than any of the alternatives.

  During the 1930s, mental tests evolved and improved as their use continued to spread throughout the world. David Wechsler worked on the initial version of the tests that would eventually become the Wechsler Adult Intelligence Scale and the Wechsler Intelligence Scale for Children, the famous WAIS and WISC. Terman and his associates published an improved version of the Stanford-Binet. But these tests were individually administered and had to be scored by trained personnel, and they were therefore too expensive to administer to large groups of people. Psychometricians and test publishers raced to develop groupadministered tests that could be graded by machine. In the search for practical, economical measurements of intelligence, testing grew from a cottage industry to big business.

  World War II stimulated another major advance in the state of the art, as psychologists developed paper-and-pencil tests that could accurately identify specific military aptitudes, even ones that included a significant element of physical aptitude (such as an aptitude for flying airplanes). Shortly after the war, psychologists at the University of Minnesota developed the Minnesota Multiphasic Personality Inventory, the first machine-gradable standardized test with demonstrated validity as a predictor of various personality disorders. Later came the California Psychological Inventory, which measured personality characteristics within the normal range—“social presence” and “self-control,” for example. The testing industry was flourishing, and the annual Mental Measurements Yearbook that cataloged the tests grew to hundreds of pages. Hundreds of millions of people throughout the world were being psychologically tested every year.

  Attacks on testing faded into the background during this period. Though some psychometricians must have known that the tests were capturing human differences that had unsettling political and social implications, no one of any stature was trying to use the results to promote discriminatory, let alone eugenic, laws. And though many intellectuals outside the testing profession knew of these results, the political agendas of the 1940s and 1950s, whether of New Deal Democrats or Eisenhower Republicans, were more pragmatic than ideological. Yes, intelligence varied, but this was a fact of life that seemed to have little bearing on the way public policy was conducted.

  INTELLIGENCE BESIEGED

  Then came the 1960s, and a new controversy about intelligence tests that continues to this day. It arose not from new findings but from a new outlook on public policy. Beginning with the rise of powerful social democratic and socialist movements after World War I and accelerating across the decades until the 1960s, a fundamental shift was taking place in the received wisdom regarding equality. This was most evident in the political arena, where the civil rights movement and then the War on Poverty raised Americans’ consciousness about the nature of the inequalities in American society. But the changes in outlook ran deeper and broader than politics. Assumptions about the very origins of social problems changed profoundly. Nowhere was the shift more pervasive than in the field of psychology.

  Psychometricians of the 1930s had debated whether intelligence is almost entirely produced by genes or whether the environment also plays a role. By the 1960s and 1970s the point of contention had shifted dramatically. It had somehow become controversial to claim, especially in public, that genes had any effect at all on intelligence. Ironically, the evidence for genetic factors in intelligence had greatly strengthened during the very period when the terms of the debate were moving in the other direction.

  In the psychological laboratory, there was a similar shift. Psychological experimenters early in the century were, if anything, more likely to concentrate on the inborn patterns of human and animal behavior than on how the learning process could change behavior.18 But from the 1930s to the 1960s, the leading behaviorists, as they were called, and their students and disciples were almost all specialists in learning theory. They filled the technical journals with the results of learning experiments on rats and pigeons, the tacit implication being that genetic endowment mattered so little that we could ignore the differences among species, let alone among human individuals, and still discover enough about the learning process to make it useful and relevant to human concerns.19 There are, indeed, aspects of the learning process that cross the lines between species, but there are also enormous differences, and these differences were sometimes ignored or minimized when psychologists explained their findings to the lay public. B. F. Skinner, at Harvard University, more than any other of the leading behaviorists, broke out of the academic world into public attention with books that applied the findings of laboratory research on animals to human society at large.20

  To those who held the behaviorist view, human potential was almost perfectly malleable, shaped by the environment. The causes of human deficiencies in intelligence—or parenting, or social behavior, or work behavior—lay outside the individual. They were caused by flaws in society. Sometimes capitalism was blamed, sometimes
an uncaring or incompetent government. Further, the causes of these deficiencies could be fixed by the right public policies—redistribution of wealth, better education, better housing and medical care. Once these environmental causes were removed, the deficiencies should vanish as well, it was argued.

  The contrary notion—that individual differences could not easily be diminished by government intervention—collided head-on with the enthusiasm for egalitarianism, which itself collided head-on with a halfcentury of IQ data indicating that differences in intelligence are intractable and significantly heritable and that the average IQ of various socioeconomic and ethnic groups differs.

  In 1969, Arthur Jensen, an educational psychologist and expert on testing from the University of California at Berkeley, put a match to this volatile mix of science and ideology with an article in the Harvard Educational Review.21 Asked by the Review’s editors to consider why compensatory and remedial education programs begun with such high hopes during the War on Poverty had yielded such disappointing results, Jensen concluded that the programs were bound to have little success because they were aimed at populations of youngsters with relatively low IQs, and success in school depended to a considerable degree on IQ. IQ had a large heritable component, Jensen also noted. The article further disclosed that the youngsters in the targeted populations were disproportionately black and that historically blacks as a population had exhibited average IQs substantially below those of whites.

 

‹ Prev