Jensen made this pronouncement in his magnum opus, The g Factor, published in 1998. The list of eminent scholars who have shared that view began with Cyril Burt and Lewis Terman in the early part of the twentieth century and continued through the rest of the century and into the twenty-first century with figures such as Raymond Cattell, Nathan Brody, Hans Eysenck, John Loehlin, David Geary, Diane Halpern, Thomas Bouchard, David Lubinski, and Camilla Benbow. I should add that Richard Herrnstein and I took the same position in The Bell Curve.
This does not mean that everyone accepts that the question has been settled. A lively and sometimes acrimonious debate has been ongoing in recent years that you may follow by checking the sources in the note.[61] It is still technically unsettled. My own sense—and it’s no more than that, from someone who is knowledgeable about IQ but not expert in the abstruse technical issues that are being disputed—is that the possibility of a trivial sex difference in g is still in play but the demonstration of a meaningful one is not.
Do Sex Differences in Abilities Diminish in Countries with Greater Gender Egality?
The question has different answers for academic abilities and measures of visuospatial skills.
Academic Abilities
PISA test results from the early 2000s gave reason to believe that greater gender egality had a meaningful relationship with academic test scores, but data since then have made that case increasingly tough to make.62 The emerging story is both more complicated and more interesting. It appears that all of the following are likely to be true:
Worldwide, overall sex differences in performance on math and science tests in the normal range are trivially small. The 2015 PISA survey included 67 countries. The overall mean effect size on the math test was –0.05—a tiny difference favoring boys.[63] The overall mean effect size on the science test was +0.01—no difference. The TIMSS survey of 2011 included 45 countries. The overall mean effect size on the math test was +0.04—a tiny difference favoring girls. The overall mean effect size on the science test was +0.05—a tiny difference again favoring girls.
The differences that do appear in some individual countries have a weak and inconsistent relationship with gender egality. Some analyses of the PISA and TIMSS survey in the early 2000s found a negative correlation between the size of the sex difference in mathematics and the indexes of gender equality in the culture.64 When Gijsbert Stoet and David Geary analyzed all four PISA administrations from 2000 to 2009, they concluded that the patterns in the early 2000s were not sustained:
If anything, economically developed countries with strong sex-equality and human development scores tended to have a larger sex difference in mathematics than less economically developed countries.… Further, we found considerable variation among lower scoring countries, with some showing a large sex difference in mathematics achievement favoring boys and others favoring girls. In other words, the sex differences in mathematics were more consistently found among higher-achieving nations, a pattern which coincides with the larger sex difference in mathematics in high-achieving students.65
The results of the most recent administrations of the PISA and TIMSS tests are consistent with that finding.
When the standardized scores for the Gender Development Index (GDI), Gender Inequality Index (GII), and Global Gap Index (GGI) are combined, the biggest effect sizes favoring boys in math were Honduras, Austria, and Ghana. In science, they were again Ghana and Honduras, plus Costa Rica. It’s hard to make much of that pattern with regard to gender egality in political and social institutions. But it’s even harder when you consider the biggest effect sizes favoring girls: Oman, Bahrain, and Jordan for math; Jordan, Albania, and the United Arab Emirates for science—not countries known for their enlightened gender policies. Taking the data from the last two decades as a whole, cross-national academic test scores show no significant relationship to measures of gender egality. Details are in the note.[66]
The most plausible explanation for the substantial effect sizes in math and science that appear in some individual countries is cultural, not biological. Why should some Arab countries that are notorious for legal and cultural discrimination against women produce female high school students who perform better in math than their privileged male classmates while nothing approaching the same gap favoring females is found elsewhere, either in countries with high gender egality or in non-Arab countries with low gender egality? It looks as if something about Arab socialization of children either depresses male incentives to do well in math and science or increases female incentives to do well in math and science.
The most plausible explanation for the consistent female advantage in verbal tests is biological, not cultural. The story for reading achievement in the PISA test echoes the consistent female advantage found in U.S. tests of verbal skills. Girls outscored boys in reading in every single PISA country, with effect sizes that ranged from a low of +0.08 in Peru to a remarkable high of +0.83 in Jordan. Nor was Jordan alone among nations with bad records on gender egality but large effect sizes favoring girl students. Other nations in the bottom half of the gender egality index but with effect sizes of +0.40 or higher were Algeria, the United Arab Emirates, Qatar, and Georgia. The mean effect size across all 67 PISA nations was +0.32. The correlation of the effect size with the egality index was –.11.67
It is difficult to reconcile the universal advantage of women in verbal tests with socialization or social role theories, neither of which has ever appealed to the idea that the oppression of women can enhance their cognitive ability. All the social-construct argumentation is based on the proposition that discrimination has suppressed female accomplishment. Nor can the argument be easily shifted by arguing that social roles encourage women to be more social and verbal, which is then reflected in superior verbal skills. The verbal test in PISA is not about sociability. It measures a cognitive ability to assimilate and analyze language that is as cognitively demanding as mathematics is in the nonverbal domain. There is no evidence that underlying verbal ability can be taught, either deliberately or through socialization. The parsimonious explanation for the international female advantage in verbal tests, across cultures that cover the full range from openly oppressive to aggressively gender-equal, is that women have a genetic advantage.
Measures of Visuospatial Skills
Some evidence indicates that sex differences in visuospatial skills are greater in countries with greater gender egality. In 2005, the BBC conducted an Internet survey of sex differences that included tests of mental rotation and line-angle judgment. Total sample sizes were 90,433 and 95,364 respectively, with sample sizes large enough to reliably explore sex differences for 53 countries. An analysis (first author was Richard Lippa) found, “Sex differences in mental rotation and line angle judgment performance were universally present across nations, with men’s mean scores always exceeding women’s mean scores.”68 The mean national effect size was –0.47 for the mental rotation task and –0.49 for the line-angle judgment task, both favoring men and statistically significant at p < .001.69
“STATISTICALLY SIGNIFICANT AT P < .001”
The phrase statistically significant is commonly misunderstood. In assessing the statistical significance of a quantitative relationship, the null hypothesis is that no relationship exists. Suppose we are once again talking about the sex difference in height. The null hypothesis is that the mean heights of men and women are the same. The statistical test asks, “If the null hypothesis is true, how likely is it that I nonetheless got these results by chance?” The statistic p is a proportion. Thus the standard requirement for reaching statistical significance, p < .05, means that there must be less than a 5 percent probability that you got your results even though the null hypothesis is true. A result of p < .001 means that the probability was less than one in a thousand.
“Statistically significant” doesn’t mean much by itself. Given a large enough sample, trivial effect sizes will be statistically significant. Given small enough samples, large eff
ect sizes will fail to reach statistical significance. Sample sizes (n), effect sizes (d), and statistical significance (p) must be considered jointly.
The Lippa study then calculated the correlations between national effect sizes of sex differences and four measures of national development: the UN Gender Development Index, UN Gender Empowerment Index, per capita income, and life expectancy. For all of these measures, “high” equals “good” (more gender egalitarian or economically developed), so, according to social-construct theories, the correlations with the size of the gender difference should be negative (the effect sizes should be smaller for more egalitarian or developed societies). The table below shows the correlation coefficients from the Lippa study after controlling for age and education.
Index of national development: UN Gender Development Index
Correlation after adjusting for age and education
Mental rotation: +.42*
Line-angle judgment: +.47*
Index of national development: UN Gender Empowerment Index
Correlation after adjusting for age and education
Mental rotation: +.11
Line-angle judgment: +.31*
Index of national development: Per capita income
Correlation after adjusting for age and education
Mental rotation: +.08
Line-angle judgment: +.42*
Index of national development: Life expectancy
Correlation after adjusting for age and education
Mental rotation: +.33*
Line-angle judgment: +.68*
Source: Adapted from Lippa, Collaer, and Peters (2010): Tables 1 and 2. Asterisk indicates that p < .05.
The more advanced the country, the wider the sex differences in both visuospatial tasks. The relationship was stronger on the line-angle judgment task—all four indices were significantly correlated with the effect size, at the p < .01 level or better for three of the four. For the mental rotation task, the correlations were significant at the p < .01 level for both the UN Gender Development Index and life expectancy. But the main point of the table is that not a single correlation, large or small, is negative—a finding directly at odds with expectations of the social-construct logic.
Why should these differences in visuospatial skills be wider in more developed countries? Lippa offers potential explanations based on the greater effects of stereotype threat in advanced countries and evolutionary theories that posit greater sensitivity of males to environmental challenges, but these remain only hypotheses.70 Nobody knows.
Recapitulation (and Integration)
I have bombarded you with a great many numbers about a great many different kinds of male and female differences in neurocognitive functioning. Two integrative analyses, conducted by leading scholars in their respective fields, help to see the broader picture.
Patterns on a Broad Neurocognitive Battery
First, consider the profiles of neurocognitive functioning found in a major recent study of neurocognitive sex differences in children and young adults. It was led by psychologists Ruben and Raquel Gur. They examined the largest and best-documented sample of its kind, the Philadelphia Neurodevelopmental Cohort (PNC). It consists of 9,122 persons ages 8 to 21, divided between 4,405 males and 4,717 females.
The participants were administered the Computerized Neurocognitive Battery (CNB). A neurocognitive battery of tests is not the same as an IQ test battery that is being used to measure different aspects of g. Rather, neurocognitive refers to bits and pieces of the way a person’s brain works, focusing on ones that can be linked to the functioning of specific brain systems. The most common categories covered by the major tests of neurocognitive functioning include executive function (such things as mental flexibility, planning, and strategic decisions), memory, complex cognition (verbal and visuospatial facility), social/emotional cognition, and sensorimotor function. A neurocognitive battery commonly contains at least 10 subtests, and some contain a few dozen.
The battery administered to the Philadelphia Neurodevelopmental Cohort consists of 14 subtests designed to measure executive function, episodic memory, complex cognition, social cognition, and sensorimotor and motor function. Twelve of the subtests have two measures: the accuracy and speed of the participant’s performance. The other two measure only the speed of motor and sensorimotor function. In all, the test yielded 26 male-female comparisons. Twelve of them amounted to an absolute effect size of less than 0.1. Women outscored men on six of the seven measures of accuracy with an effect size greater than 0.1, and they outscored men on four of the seven measures of speed with an effect size greater than 0.1.71 The highlights are similar to findings you have already encountered:
Females had more accurate memory for items involving words and people.
On IQ-like items, women did better on the verbal ones; men did better on the spatial ones.
On the three subtests measuring social cognition, females were both more accurate and faster than males on all of them.
On the subtest measuring motor speed, males were faster than females.
The authors describe another pattern that did not involve specific subtests, but rather an overall construct called within-individual variability (WIV), referring to the evenness or unevenness of performance on the test battery. A participant with high scores on some subtests and low ones on others has high WIV; a participant who is near the same point on the distribution on all the tests has low WIV. In the technical literature, high WIV is associated with cognitive specialization, while people with low WIV are considered to be cognitive generalists.72 Males in the Philadelphia Neurodevelopmental Cohort had higher WIV than females on both speed and accuracy for almost all ages from 8 to 21, and the difference was most pronounced in the oldest participants.73
The magnitude of the effect sizes ranged from small to medium. Given such large sample sizes, all but two of the differences were statistically significant. Ruben and Raquel Gur summarized their findings this way. The full citations for the references they mention are included in the note:
In summary, behavioral measures linked to brain function indicate significant sex differences in performance that emerge early in development with domain variability that relates to brain maturation. Notably, our findings are in line with a robust literature documenting sex difference in laterality and behavior (e.g., Linn and Petersen 1985; Thomas and French 1985; Voyer et al., 1995; Halpern et al., 2007; Williams et al., 2008; Hines 2010; Moreno-Briseño et al., 2010). These findings support the notion that males and females have complementary neurocognitive abilities, with females being more generalists and outperforming males in memory and social cognition tasks and males being more specialists and performing better than females on spatial and motor tasks.74 [Emphasis added.]
We will get to the “sex differences in laterality” reference in chapter 5. For now, the Gurs’ summary is a concise way of expressing the pattern of differences that the individual sections of this chapter have described.
Male and Female Differences in Cognitive Toolboxes
Even when men and women get the same answers to their cognitive tasks, they often get there by different routes. For example, people with high verbal skills often get the right answer to mathematics problems, but by using verbal forms of logic rather than mathematical symbols or spatial reasoning. Another well-documented example is how people navigate from point A to point B. Women tend to identify and remember landmarks—a strategy that taps into the female advantage in memory. Men tend to construct a mental map of the route—a strategy that taps into the male advantage in visuospatial skills.75 Both methods work equally well for a wide variety of navigation tasks.76 People are just using different sets of tools to get the job done.
In the early 2000s, Wendy Johnson and Thomas Bouchard, senior psychologists at the famed Minnesota Institute for the Study of Twins Raised Apart (MISTRA), decided to extend the metaphor of cognitive tools.77 Using an analogy, they hypothesized that everyone has an “intellectual toolbo
x,” but no two are exactly alike. They are stocked with varying tools that people use with different frequencies, different degrees of skill, and in different ways, and there are systematic toolbox differences between men and women. On average, men and women can accomplish most intellectual tasks equally well with their different choices and uses of tools. Hence the similarity in overall g. “But some tasks can be accomplished much better with certain tools than with others,” Johnson and Bouchard write, “and individual performance on these tasks depends not only on skill in tool use, but also to some degree on individual toolbox composition.… The analogy is incomplete, of course, but it makes clear the question we address in this paper, namely, what are the differences in specific tool use (mental abilities) of men and women when overall skill in tool choice and use (g) is removed?”78
Johnson and Bouchard used the MISTRA sample, consisting of adult twins raised apart along with many of their spouses, partners, adoptive and biological family members, and friends. The sample was not representative, but its members came from a wide range of backgrounds, and the researchers had extraordinarily thorough information about them. All of them had gone through at least one weeklong assessment of medical and physical traits plus psychological tests of cognitive abilities, personality, interests, and attitudes.
Johnson and Bouchard used sophisticated quantitative methods. Describing them would take us far afield (it was a combination of factor analysis and regression analysis), but the result is simple enough to understand.
Imagine a man and woman with equal general intelligence (g). The woman uses her elevated verbal skills to help her solve math problems while the man uses his elevated visuospatial skills to help solve him math problems. They take two math tests. One consists entirely of problems expressed in mathematical notation. The other consists of math problems expressed in words. They both get most of the items right on both tests—g goes a long way toward enabling people to solve math problems no matter what their special skills might be. But the woman gets a slightly higher score than the man on the word-problem test while the man gets a slightly higher score on the one using mathematical notation. The net result is no sex difference. But actually there was a difference in tools that the man and woman used. What Johnson and Bouchard did was to strip away the role played by g and let us see the differences in tools. A more precise description is given in the note.[79]
Human Diversity Page 8