Human Diversity
Page 48
37. Halpern (2012): 138.
38. Halpern (2012): 130.
39. Halpern (2012): 130.
40. For a literature review, see Vasta and Liben (1996). For international results, see de Lisi, Parameswaran, and McGillicuddy-de Lisi (1989).
41. Law, Pellegrino, and Hunt (1993).
42. Contreras, Rubio, Peña et al. (2007).
43. Huguet and Régner (2007) found that girls performed worse than boys at such a memory task when it was described as testing memory for geometry but better than boys when it was described as testing memory for drawing. The more recent literature on stereotype threat and math and visuospatial skills more commonly has found little or no effect and also found evidence of publication bias; e.g., Pennington, Litchfield, McLatchie et al. (2018); Stoet and Geary (2012); Ganley, Mingle, Ryan et al. (2013). For more on stereotype threat, see chapter 13.
44. Men also have an advantage in visuospatial knowledge and memory. For example, they learn a route from a two-dimensional map in fewer trials and with fewer errors than females. This does not seem to be a function of greater male driving experience. Rather, men and women tend to use different strategies in finding their way from point A to point B, with women likely to use landmarks (“turn right at the bank”), while men are more likely to use points of the compass (“turn north”) and distances (“turn after three miles”), and to create mental maps. Boone, Gong, and Hegarty (2018). In this regard, Halpern describes at length the curious story of the National Geography Bee—a competition similar to the National Spelling Bee, except that it asks questions about geographical features. An article by Lynn Liben in 1995 pointed to what she described as “a shocking gender disparity among winners at every level.” In 1993, about 14,000 of 18,000 school winners had been boys, as were 55 of the 57 winners in states and territories. That disparity has persisted undiminished in the quarter century since Liben’s article. Through 2018, 27 of the 29 winners were male. What makes this especially intriguing is the contrast with the National Spelling Bee. Both competitions require extraordinary memorization. Neither geography nor spelling plays to stereotypical male or female interests. And yet while 27 of 29 National Geography Bee winners have been male, the 95 winners of the Spelling Bee have been split almost equally between girls and boys, with girls holding a 49-to-47 advantage as of 2018. Furthermore, girls have more than held their own since the earliest years—girls won 7 of the first 10 contests.
45. The classic test for ToM is called a “false-belief task.” For example, a child is given pictures of another child (Bill), a playground, and a classroom. The child is told that Bill is going to look for his lunch bag. Bill’s lunch bag is really on the playground, but Bill thinks his lunch bag is in the classroom. “Where do you think Bill will look for his lunch bag?” Until they’re about three, children expect Bill to look for it where the child knows it is—on the playground. Around age four, children are able to predict Bill’s behavior on the basis of Bill’s false belief, not their own true belief. See Premack and Woodruff (1978).
46. Maccoby and Jacklin (1974): 214.
47. Hall (1978): Table 4.
48. E.g., Rosip and Hall (2004); Schmid, Schmid Mast, Bombari et al. (2011).
49. Thompson and Voyer (2014): 1175. The authors followed a common practice of establishing a lower bound by assigning an effect size of zero to any study reporting that the effect size was statistically insignificant but without reporting its magnitude. This amounted to 147 out of 551 effect sizes in the meta-analysis. When they were excluded, the mean effect size was +0.27.
50. Thompson and Voyer (2014): Table 2. I cannot resist noting a detail of the analyses reported in Table 2. Almost all of the 551 effect sizes in the studies used for the meta-analysis had a mix of male and female subjects, and they showed a lower-bound female advantage of +0.17. Thirty-one studies had only female subjects, and the female advantage was +0.18. Just eight of the studies had exclusively male subjects. The female advantage was +0.61. To put it another way, women may be only modestly better than men at figuring out the emotions that women are feeling, but they are definitely better than men at figuring out the emotions that men are feeling.
Another interesting finding in the Thompson meta-analysis involved variations in effect size depending on the kind of emotion involved. In the most widely used classification, the six basic emotions are happiness, anger, sadness, fear, surprise, and disgust. Ekman (1999). Past research has established that negative emotions are more difficult to identify than positive ones—and that’s where women’s advantage over men was concentrated. The lower bound effect sizes were +0.24 for negative emotions, compared to +0.19 for positive emotions. Thompson and Voyer (2014): Table 2.
51. Joseph and Newman (2010): Table 6.
52. There is intriguing experimental evidence on this point. In one set of experiments, participants were given a variety of incentives that would encourage them to “try harder” to understand the emotional states of others. It didn’t seem to make any difference, just as “trying harder” doesn’t make any difference in being able to mentally rotate objects in three dimensions. Hall, Blanch, Horgan et al. (2009).
53. Valla, Ganzel, Yoder et al. (2010).
54. Herlitz and Yonker (2010): 112.
55. Allen, Rueter, Abram et al. (2017).
56. Full-scale IQ tests such as the Wechsler or Stanford-Binet are designed to minimize sex differences. From the earliest versions of such tests, each item in each subtest has been examined to see whether it is easier for one sex than the other. If so, either a counterbalancing item favoring the other sex has been included in that subtest or the item has been removed altogether. For a discussion of the methods for eliminating gender differences in IQ subtests (which extends to a citation of a book written in 1914), see Matarazzo (1972): 352–58. This practice cannot altogether remove gender differences in the final published test, because the sample used for the item analysis and the sample used for establishing the national norms are different, but the process introduces a heavy thumb on the scale that prevents large gender differences from emerging.
57. The standardization sample for the 1988 revision of the Woodcock-Johnson battery of cognitive tests showed a female advantage of 2.4 points in the 1988 revision (author’s analysis, data provided courtesy of the Woodcock-Johnson Foundation). The results apply to test-takers ages 18 through 65. In one of the largest surveys ever conducted, the nearly universal Scottish Mental Survey of 1947, females had a mean that was 1.7 points higher than the male mean. Johnson, Carothers, and Deary (2008): 521. In the large British sample of children ages 11–12 who took CogAT in 2001–3, the total score slightly favored girls, with an effect size of +0.05. Strand, Deary, and Smith (2006): Table 3.
58. Data are taken from the WAIS technical manuals published with each standardization.
59. Jensen (1998): 536.
60. Jensen (1998): 538. Colom, Juan-Espinosa, Abad et al. (2000); Colom, García, Juan-Espinosa et al. (2002); and Deary, Thorpe, Wilson et al. (2003) conducted analyses of the issue and came to the same conclusion.
61. Lynn and Irwing (2004) reopened the debate, arguing that there are no sex differences or a slight female advantage in early adolescence, but a nontrivial male advantage opens by late adolescence. Lynn has continued to compile additional data supporting the existence of a nontrivial sex difference in g, most recently in Lynn (2017), a target article that attracted 11 detailed commentaries. Taken together, they will give you a good overview of the state of the debate. Other articles for the affirmative since the question was reopened in 1994 are Lynn (1999); Irwing and Lynn (2005); Jackson and Rushton (2006); and Irwing (2012). Others for the negative are Colom, Juan-Espinosa, Abad et al. (2000); Halpern and LaMay (2000); Blinkhorn (2005); van der Sluis, Posthuma, Dolan et al. (2006); and Iliescu, Ilie, Ispas et al. (2016). For a recent article with new data and an overview of previous research, see Arribas-Aguila, Abad, and Colom (2019).
62. The most widely publicized study that
made the case for a relationship between test scores and gender egality, Else-Quest, Hyde, and Linn (2010), analyzed data from the 2003 round of the Trends in International Mathematics and Science Study (TIMSS) and the 2004 PISA round.
63. Author’s analysis, 2015 PISA scores downloaded from the OECD website, www.oecd.org/pisa/. That database includes the standard deviation for each country, enabling effect sizes to be based country-specific in both their means and standard deviations. The means given in the text are calculated as the mean of the individual effect sizes for the 67 countries. If instead the effect size is calculated from the mean male and female test scores and the standard deviations of those means, the female advantage is not +0.32, as given in the text, but +0.62. The effect sizes for math and science are –0.09 and +0.01 respectively, not much different from those reported in the text.
64. Guiso, Monte, Sapienza et al. (2008); Marks (2008); Else-Quest, Hyde, and Linn (2010).
65. Stoet and Geary (2013). See also Stoet and Geary (2015).
66. The sign for the GII was reversed to make it correspond to the sign of the other two (so that “high” = “good” on all three). The correlations among the three separate indexes for 2015 are +.64 for the GDI and GII, +.64 for the GDI and GGI, and +.52 for the GII and GGI. The face validity for the combined indexes is good. The top-fifteen-ranked nations in gender egality are, in order, Iceland, Finland, Norway, Sweden, Slovenia, Switzerland, Ireland, Estonia, Lithuania, Latvia, Denmark, Germany, France, Belarus, and Belgium. The bottom fifteen, starting with the last-ranked, were Yemen, Pakistan, Chad, Mali, Côte d’Ivoire, Syria, Mauritania, Morocco, Liberia, Iran, Benin, Jordan, Burkina Faso, Gambia, and Swaziland.
In some individual countries, the effect sizes of the PISA or TIMSS surveys were nontrivial. Here are the countries with an absolute difference on the math test of d ≥ 0.20 on either the PISA or the TIMSS surveys:
Male advantage
Honduras (–0.31)
Austria (–0.29)
Ghana (–0.28)
Argentina (–0.24)
Costa Rica (–0.24)
Tunisia (–0.22)
Chile (–0.22)
Lebanon (–0.22)
Italy (–0.21)
Ireland (–0.21)
Female advantage
Oman (+0.60)
Bahrain (+0.44)
Jordan (+0.29)
Thailand (+0.21)
The greatest female advantage was in Oman, ranked 104 out of 134 countries on a combined index of gender egality. Jordan was ranked 123rd, Bahrain 86th, and Thailand 60th. Meanwhile, the countries with a male advantage were evenly split among nations in the top and bottom half of the rankings. This explains the weak correlations between the index of gender egality and the effect size on the math test: just –.17 for the PISA dataset and –.05 for the TIMSS dataset.
Now for the science test, again showing the countries with an absolute difference of d ≥ 0.20.
Male advantage
Ghana (–0.27)
Honduras (–0.26)
Costa Rica (–0.26)
Tunisia (–0.25)
New Zealand (–0.24)
Hungary (–0.22)
Chile (–0.21)
Italy (–0.20)
Female advantage
Jordan (+0.47)
Albania (+0.31)
United Arab Emirates (+0.25)
Qatar (+0.23)
Trinidad (+0.23)
Finland (+0.21)
Algeria (+0.20)
Countries far down the list on the egality index—Jordan (123rd), Qatar (102nd), United Arab Emirates (78th)—were associated with a female advantage on science, while two of the countries where males had an advantage were well into the upper half—New Zealand (16th) and Italy (28th). The correlation of the egality index with the effect size in science was –.31 in the PISA dataset and –.22 in the TIMSS dataset.
67. Author’s analysis of the 2015 PISA results.
68. Lippa, Collaer, and Peters (2010): 993.
69. Lippa, Collaer, and Peters (2010): 993.
70. The problems in interpreting the Lippa findings for personality traits (see note 37 for chapter 2) were less problematic for interpreting the results for visuospatial skills. It remains true that the test was effectively restricted to English speakers, but the instructions for completing the visuospatial questions were insensitive to level of English ability (unlike questions about personality that used vocabulary with subtle distinctions in meaning).
71. Gur and Gur (2017): Fig. 2.
72. Roalf, Gur, Ruparel et al. (2014).
73. Roalf, Gur, Ruparel et al. (2014).
74. Gur and Gur (2017): 191. The references are Linn and Peterson (1985); Thomas and French (1985); Voyer, Voyer, and Bryden (1995); Halpern, Benbow, Geary et al. (2007); Williams, Mathersul, Palmer et al. (2008); Hines (2010); and Moreno-Briseño, Diaz, Campos-Romo et al. (2010).
75. Boone, Gong, and Hegarty (2018).
76. Men and women perform equally on many navigational tasks, but Nazareth, Huang, Voyer et al. (2019), a meta-analysis of human navigation literature covering 694 effect sizes from 266 studies, found an overall d of –.34 to –.38 (favoring males).
77. Johnson and Bouchard (2007). See also Johnson and Bouchard (2005).
78. Johnson and Bouchard (2007): 24.
79. By using factor analysis and regression analysis in tandem, Johnson and Bouchard were able to calculate residual effect sizes. Johnson/Bouchard’s hypothesis about different toolboxes posited two conditions: First, g makes use of a problem-solving toolbox that differs from individual to individual. Second, the overall usefulness of the tools and skill in their use are evenly distributed between men and women. “In the presence of these conditions,” the authors wrote, “g should tend to mask sex differences in the specialized tools that contribute to more specialized abilities. Thus, its removal from the scores on a battery of tests through statistical regression should reveal greater sex differences in the residual scores than in the original full scores, and the sex differences in the scores on the residual factor scores should also be larger than commonly observed sex differences in mental ability test scores.” They also hypothesized that, given these conditions, it would be possible to understand more clearly the dimensions on which cognitive abilities tend to differ between men and women—that is, differences either in the tools that tend to be in their boxes or the ways they use them, or both. Johnson and Bouchard (2007): 25.
80. The exceptions were picture arrangement (p < .009) and WAIS information (p < .002).
4: Sex Differences in Educational and Vocational Choices
1. In the United States, males account for about 80 percent of arrests for violent crime. Federal Bureau of Investigation, Uniform Crime Report for 2017: Table 33. For broader evidence on predominantly male antisocial behavior, see Heidensohn and Silvestri (2012) and Del Giudice (2015).
2. Lubinski, Benbow, and Kell (2014): Figs. 4–5. The numbers for the effect sizes depicted in the bar chart were provided by David Lubinski, personal communication.
3. None of the statements in quotes with which respondents agreed or disagreed reached an effect size of +0.35 for Cohort 2. Lubinski, Benbow, and Kell (2014): Fig. 5.
4. There were also questions about working no more than 50 and 60 hours a week, which women also answered affirmatively more than men (effect sizes were +0.53 and +0.44 respectively).
5. “IQs of about 140 or higher” is an estimate, since the SMPY youngsters weren’t given IQ tests. The cutoff for percentile 99.5 for full-scale IQ tests, which are normed to a mean of 100 and a standard deviation of 15, is 139.
6. A student could qualify for SMPY by getting an SAT verbal score of at least 430 regardless of SAT math score, but all of the students chosen for the follow-up had SAT math scores of at least 390, putting them in the top percentile on math among 7th graders. Lubinski, Benbow, Shea et al. (2001): 310.
7. A caveat: SMPY students who were toward the bottom of the top percen
tile in math (equivalent to an IQ of 135) might have struggled with a mathematics or physics major at a demanding school like MIT or Caltech. But that’s a pretty small caveat.
8. Cohort 1 was recruited in 1972–74 (n = 2,188) from those who scored in the top 1 percent on the SAT math. Beginning with the second wave of Cohort 1 and continuing for the rest of the cohorts, students could qualify via either the SAT math or the SAT verbal. Technically, a student with an extremely high verbal score could be selected without being in the top 1 percent on the SAT math. In practice, almost all of those in Cohorts 2 and 3 who qualified via the SAT verbal were in the top 1 percent on math as well. Cohort 2 was recruited in 1976–79 (n = 778) from the top 0.5 percent. Cohort 3 was recruited in 1980–83 (n =501) from the top 0.01 percent. Cohort 4 was recruited in 1992–97 (n =1,130) and consisted of students who were in the top 3 percent of students in academic ability, with a large majority qualifying for the top 0.5 percent. Cohort 5, a form of control group, was recruited in 1992 (n = 714) from graduate students ages 23–25 enrolled in a STEM field at the 15 top-ranked graduate programs in engineering, math, or science. Lubinski and Benbow (2006): Table 1.
Full disclosure: One of my daughters was part of the Talent Search program that generated the SMPY (though not part of any of the follow-up cohorts) in the late 1980s. My description of SMPY parents in the text matches the way her parents were proud of her math talent and urged her to consider STEM fields. She listened attentively. She reached Harvard in the early 1990s while Richard Herrnstein and I were working on The Bell Curve. Herrnstein, who had become her friend, urged her to major in applied math. She listened attentively. And decided on Renaissance history and literature. Why? “It’s complicated,” she says.
9. Raymond and Benbow (1986): 816.
10. The slogan was made famous by Steinem, but she confirmed in a letter to Time magazine (September 16, 2000) that the saying was originated in 1970 by Irina Dunn, an Australian educator.
11. Twenty-five percent of the men and 29 percent of the women had stopped at the bachelor’s degree. The same proportion (32 percent) of men and women had completed master’s degrees. Forty percent of the men and 38 percent of the women had completed PhDs.