Human Diversity
Page 47
32. Schmitt, Realo, Voracek et al. (2008).
33. A description of the index and the annual scores may be found at the World Economic Forum’s website, www.weforum.org.
34. The countries with the five largest values of D, indicating the largest sex differences in personality, were the Netherlands (1.17), Norway (1.13), Sweden (1.11), Canada (1.07), and the UK (1.06). The countries with the five smallest values of D, indicating the smallest sex differences in personality, were China (0.47), Malaysia (0.58), Japan (0.59), South Korea (0.59), and India (0.70). Author’s analysis of Mac Giolla and Kajonius (2018): Appendix B.
35. Falk and Hermle (2018): 3 of 6. The author’s Gender Equality Index is an inversion of the UN’s Gender Inequality Index.
36. Falk and Hermle (2018): 3 of 6.
37. A cross-national study conducted two years after the Schmitt study, Lippa (2010), provides additional evidence for these patterns but poses problems in interpretation. Psychologist Richard Lippa used a BBC Internet survey conducted in 2005 that attracted 255,114 people who responded to at least some items in each of six modules. From these data, Lippa compared 53 nations on measures he constructed for extraversion, agreeableness, and emotional stability using indicators from the Cattell inventory. The effect sizes for the three were +0.18, +0.61, and –0.41 respectively. The effect sizes in Lippa’s data showed significant widening of the difference in agreeableness (r = –.41 for agreeableness and the Gender Inequality Index), minor widening for emotional stability (r = –.14), and no significant effect for extraversion (r = +.04). Lippa (2010): Table 1. The difficulty with interpreting the Lippa findings arises from the nature of the sample. The Internet survey was conducted exclusively in English. This means that the sample is self-selected for people who speak English and use the BBC and the Internet. Among those for whom English is a second language, there are problems of misunderstanding the questions. Despite the reasons for expecting that Lippa’s sample could produce quite different results from the other cross-national studies (and there were indeed some differences), the same pattern of results was found: “In summary, the current results strongly supported Costa et al.’s (2001) and Schmitt et al.’s (2008) findings that sex differences in personality are highly replicable across cultures. However, they were sometimes inconsistent with Costa et al.’s conclusion that sex differences in personality are necessarily ‘modest in magnitude’ (p. 328). The mean effect sizes for sex differences in agreeableness and emotional stability, although moderate in magnitude, were still well within the range of effect sizes for many classic person and situation effects in psychology (Eagly, 1995; Lipsey & Wilson, 1993). More dramatically, when analyzed at the aggregated level of men’s and women’s national means, sex accounted for 93% of the variance in MF-Occ means, 75% of the variance in agreeableness means, 68% of the variance in emotional stability means, and 23% of the variance in extraversion, and in each case, sex accounted for much greater amounts of variance than did either UN gender equality or the interaction of sex and gender equality.” Lippa (2010): 634.
38. Costa, Terracciano, and McCrae (2001): 329.
39. Costa, Terracciano, and McCrae (2001). A more elaborate version of this argument is given in Guimond, Branscombe, Brunot et al. (2007).
40. Lippa, Collaer, and Peters (2010).
41. Schmitt, Long, McPhearson et al. (2016): 6.
3: Sex Differences in Neurocognitive Functioning
1. The subsequent list of findings is taken from Halpern (2012): 104–8.
2. Halpern (2012): 104–8.
3. Binaural beats are an auditory phenomenon that occurs when two slightly different frequencies of sound are played into each ear. Otoacoustic emissions are caused by the motion of the cochlea’s sensory hair cells.
4. Halpern (2012): 106. In childhood, part of the sex difference may be explained by girls’ superior verbal skills, but other studies have found the female advantage in identifying smells in adult samples as well.
5. Halpern (2012): 109.
6. Females are not noticeably better at judging the passage of time, but there is a systematic sex difference in direction of error. Men tend to overestimate a time interval while women tend to underestimate it. Halpern (2012): 107–8.
7. Fillingim, King, Ribeiro-Dasilva et al. (2009); Rosen, Ham, and Mogil (2017).
8. Fuller (2002).
9. Fleischman (2014); Al-Shawaf, Lewis, and Buss (2017).
10. Halpern (2012): 108–10.
11. Nagy, Kompagne, Orvos et al. (2007); Piek (2002).
12. Halpern (2012): 109.
13. Halpern (2012): 110.
14. Watson and Kimura (1991).
15. Halpern (2012): 115–18.
16. Heuer and Reisberg (1990); Cahill, Haier, White et al. (2001).
17. Geary (2010).
18. Halpern (2012): 118–28.
19. Verbal reasoning. The Cognitive Abilities Test (CogAT) is a test of three types of reasoning: verbal, quantitative, and nonverbal. Since it was first published in 1984, it has undergone three revisions, in 1992, 2000, and 2011. In the four standardization samples, girls performed better than boys in virtually all grades in all standardizations (out of 28 combinations of grade level and standardization, the difference favored girls in 24). Lakin (2013): Table 3. Effect sizes were uniformly small, averaging +0.07 over all grades and standardizations. Only one of the 28 grade/standardization combinations reached an effect size of +0.20. A British sample of 320,000 students ages 11–12 who took the CogAT in 2001–3 showed a larger female advantage (d = +0.15).
Reading. America’s National Assessment of Educational Progress (NAEP) is administered periodically to large, nationally representative samples, in grades 4, 8, and 12. In the reading test, girls outperformed boys in every grade and every assessment from 1988 to 2015, with an overall effect size of +0.27. Reilly, Neumann, and Andrews (2018), which was also the source of subsequent reading and writing NAEP statistics. Twelfth-grade girls have had an advantage in the reading test that goes back to the first administration of the test in 1971. Effect sizes from 1971 to 1992 were in the +0.21 to +0.30 range. Hedges and Nowell (1995): Table 3. There was no trend across the years in the size of the difference, but grade level had a marked effect, with d rising from +0.19 in grade 4 to +0.30 in grade 8 and +0.32 in grade 12. The +0.19 effect size for grade 4 is consistent with an analysis of 2015 performance on the federally required Reading/Language Arts assessments, which found an effect size for reading of +0.19, combining scores from grades 3 through 6. Peterson (2018): Table 1. These effect sizes translate into substantial differences in the number of boys who failed to reach the minimum standard of literacy by grade 12 (1.5 times as many boys as girls) and the number of girls reaching the advanced literacy standard (1.9 times as many girls as boys).
Writing. The female advantage in writing tests is larger than in reading tests. In NAEP writing tests from 1988 to 2011, the overall effect size was +0.54. As in the case of the reading test, there was no significant trend over the years but there was a significant change in effect size from grade 4 to grade 8, when it rose from +0.42 to +0.62. It stood at +0.55 in grade 12. Reilly, Neumann, and Andrews (2018). Once again, the results from the Reading/Language Arts assessment in 2015 correspond with the NAEP results. The same study that found a female advantage of just +0.19 on the reading test for grade 3 to grade 6 found an effect size of +0.45 on the writing test. Peterson (2018): Table 1. These effect sizes translated into even larger disparities at the low and high ends, with 2.2 times as many boys failing to meet the minimum writing standard and 2.5 times as many girls reaching the advanced standard. For additional evidence of a larger effect size for writing than for reading and literature reviews, see Reynolds, Scheiber, Hajovsky et al. (2015) and Scheiber, Reynolds, Hajovsky et al. (2015).
20. Arnett, Pennington, Peterson et al. (2017).
21. In the most recent follow-up, combining 320,000 TIP students from 2011 to 2015, the male-female ratio for the top percentile was 0.88
for the SAT verbal, 0.79 for the ACT English test, and 1.09 for the ACT reading test. For the top 0.5 percent, the corresponding ratios were 0.96, 0.94, and 0.88. For the top 0.01 percent, the corresponding ratios were 0.73, 0.86, and 0.95. Makel, Wai, Peairs et al. (2016): Table 6. For tests administered until 1994, SAT scores could be interpreted relative to the national population because the Educational Testing Service conducted periodic norm samples explicitly designed for that purpose. But then the SAT was “recentered” in 1995 so that the mean was once again set at 500 (it had fallen to 428 on the verbal test and 482 on the math test). The substance of the test was revised in 2005 and again in 2017. Whether these changes were good or bad is debated (for the record, I think mostly bad), but without question those changes have made trends over time impossible to interpret relative to the general population.
Data from the standardizations of the CogAT, representing tests of children from grades 4 to 7, show virtual equality in the top five percentiles and the top percentile of the verbal/reading domain from 1992. In the original 1984 standardization, the male-female ratios were 1.15 and 1.25 for the top five percentiles and the top percentile respectively. Girls had a fractional advantage for both categories in the 1992 and 2000 standardizations, while boys had a fractional advantage for both categories in the 2011 standardization. Lakin (2013): Table 4.
22. The statement is based on information from the last national norm study conducted by the Educational Testing Service in the mid-1980s. Braun, Centra, and King (1987) combined with College Board (2016). I used the 2016 scores because the College Board introduced major changes in the test in 2017, making scores from 2017 onward incomparable with previous administrations. The table showing scores from 1972 to 2016 used a correction for the recentering of the test in 1995. Standard deviations were retrieved from the individual annual reports for college-bound seniors. The two sources together indicate that about 67 percent of the SAT pool had scores above the mean that would have been obtained if all juniors took the SAT. Extrapolating that 1983 number to recent decades involves some guesstimates (25 percent of 17-year-olds took the SAT in 1983 compared to 40 percent in recent years), but that the test-taking population is still concentrated in the upper half of the ability distribution seems incontestable.
23. In terms of effect sizes, the SAT and ACT tell the same story—small—but they have inconsistent signs. For the 45 years from 1972 to 2016, females have always had slightly lower SAT verbal scores than males—a surprising contrast to the universal female advantage for verbal tests of nationally representative samples. The effect size moved in a narrow range from –0.02 to –0.12 during those decades, with a mean of –0.06. College Board (2016): 2. The table showing scores from 1972 to 2016 used a correction for the recentering of the test in 1995. Standard deviations were retrieved from the individual annual reports for college-bound seniors.
On the SAT writing test introduced in 2006 and discontinued as of 2017, females had a small advantage that was never less than +0.10 and never larger than +0.12. There was no trend over time on either test. But for the ACT from 1995 to 2016, females have maintained an advantage in both the reading and the English tests. Effect sizes for the reading test moved in a range from +0.02 to +0.09, with a mean of +0.06. Effect sizes for the English test moved in a range from +0.11 to +0.17, with a mean of +0.14. Tables from the Digest of Educational Statistics for 2009 (Table 147) and 2016 (Table 226.50), downloadable from the National Center for Education Statistics website, nces.ed.gov/programs/digest.
One of the imponderables about SAT scores is the growing imbalance in the numbers of males and females taking the test. In 1981, about 22 percent of male 17-year-olds took the SAT compared to 25 percent of female 17-year-olds. (More precisely, that percentage represents the number of test-takers divided by the number of 17-year-olds. Some test-takers are older or younger than 17, but the percentage would change only fractionally if broken down by age at testing.) The gap first hit four percentage points in 1987, five percentage points in 1992, six percentage points in 1996, and seven percentage points in 2000. The difference is large enough that it must be assumed to deflate the female mean because the female test-taking pool dips deeper into the cognitive distribution than does the male pool. The effect is unlikely to be much, but it should be kept in mind. My wordings of the conclusions I draw are intended to tolerate such a bias.
This inconsistency in the results from the SAT and ACT mirrors the earlier data reported in a major review of the nationally representative populations from 1960 to 1995. Larry Hedges and Amy Nowell reported the male-female ratio among top scorers in reading comprehension for two major surveys of high school seniors: the National Longitudinal Study of the High School Class of 1972 and the 1980 High School and Beyond dataset. For scores in the 90th percentile and higher, the ratios were 0.94 and 1.03—meaning females retained a small advantage in one of the two. For those in the 95th percentile and higher, the ratios were 0.81 and 1.06 respectively, thereby showing the same inconsistency. It appears that for 17–18-year-olds from the upper half of the distribution to the 95th percentile, results can go both ways, depending on the test, sometimes favoring females and sometimes favoring males, always by small margins. The minimal conclusion is that the extremely consistent female advantage in the normal range of verbal ability becomes less dependable as the ability level rises.
24. Halpern (2012): 146–50.
25. Reilly, Neumann, and Andrews (2015): Table 1. The effect sizes for the 2015 NAEP in grades 4, 8, and 12 were –0.07, 0.00, and –0.08 respectively. Author’s analysis of data downloaded using the NAEP Explorer tool of a National Center for Education Statistics website, www.nationsreportcard.gov.
26. Author’s analysis of data downloaded using the NAEP Explorer tool.
27. Halpern (2012): 146.
28. For literature reviews, see Penner (2003) and Wai, Cacchio, Putallaz et al. (2010).
29. The Study of Mathematically Precocious Youth, which figures prominently in the next chapter, found ratios of about 13 favoring males in this range. Benbow and Stanley (1980), as did Wai, Cacchio, Putallaz et al. (2010) for TIP students in 1981–85.
30. The earliest reliable estimates of differentials at the extremes of mathematical ability come from Project Talent, a study based on a nationally representative sample of American 15-year-olds conducted in 1960. The male-female sex ratio was 1.3 for scores in the top 10 percent, 1.5 in the top 5 percent, 2.1 in the top 3 percent, and 7.0 in the top 1 percent. Hedges and Nowell (1995): 44. The best longitudinal data since then come from Duke’s TIP program. Makel, Wai, Peairs et al. (2016): Table 5.
31. 2016 College-Bound Seniors: Total Group Profile Report, College Board: 2, downloadable from a College Board website, reports.collegeboard.org. Halpern does not report ACT results by gender for the math test, and the ACT does not publish those data.
32. For a history of the SAT, see Lemann (1999). For a presentation of the psychometric properties of the test that tacitly confirm its measurement of the general mental factor known as g, see Donlon (1984).
33. For the technical debate about test bias, see Mattern and Patterson (2013) and Aguinis, Culpepper, and Pierce (2016). Insofar as there is bias (it is minor), it favors blacks—SAT scores slightly overpredict college grades for blacks. Aguinis, Culpepper, and Pierce (2016) provide evidence that there is differential predictive validity at the level of individual institutions, but even that differential prediction does not systematically favor whites or males. For a nontechnical discussion of the SAT, including issues about what it measures and how much good coaching does, see my article “Abolish the SAT,” available at the AEI website, www.aei.org.
34. AMC tests are given in over 3,000 high schools. I focus on AMC12, the test that is taken by 11th and 12th graders. From 2009 to 2018 (the years for which detailed data are available), the AMC12 has been taken by 59,000 to 115,000 students per year. This is a small number compared to the million-plus students who took the SAT during the same perio
d, but the schools that give the test are concentrated among the top high schools in major urban areas, which in turn, for demographic reasons, probably contain most of the extremely talented math students in the country. Furthermore, talented math students who are applying to elite colleges have a strong incentive to take the AMC12—it’s a much more difficult test than either the SAT or ACT math tests, and a high score on the AMC can set their applications apart from the many applicants to elite colleges who have an 800 on the SAT.
The test has a score range of 0 to 150. In their analysis of the AMC competitions, Ellison and Swanson (2010) concluded that a score of 100 on the AMC12 is equivalent to a score of 780–800 on the SAT math. I conducted the analysis of the 2009–18 administrations of AMC12, using data downloaded from the AMC website. Depending on the year, a score of 100 put a student at anywhere from the 94th to the 98th percentile among those who took the AMC. Thus the AMC12 gives us a glimpse into sex differences deep into the top percentile of the total population.
35. The percentiles are based on the distribution for the entire population of test-takers in a given year. The ratio divides the percentage of male test-takers who scored within that percentile that year by the percentage of female test-takers who scored within that percentile that year.
36. Halpern (2012): 128–45.