On these issues, everyone who writes about sex differences should put their personal perspectives on the table. Regarding the use of Cohen’s guidelines, I think Hyde’s reliance on them to defend the gender similarities hypothesis is misplaced. There are too many ways in which effect sizes defined as “small” by Cohen’s guidelines can have important aggregate effects when thinking about sex differences. I appeal to the arguments made by the scholars I have cited, including Cohen himself, in defense of my position.
I also disagree with Hyde’s position that Type I errors should still be more feared than Type II errors. If we were back in 1960, I would agree with her—many people assumed that men and women were separated by large differences, and research that falsely reinforced that assumption could perpetuate harmful stereotypes, just as Hyde argues. But I’m writing at the end of the second decade of the twenty-first century when so many things, from high school athletic programs to the military’s composition of combat units, are guided by the assumption that there are no relevant sex differences. My guess is that the situation in 1960 has been reversed: More harms are now inflicted by incorrectly ignoring sex differences than by incorrectly exaggerating them. At the least, it can be said that there’s no clear case that Type I error is still more harmful than Type II error. This is an argument that does not lend itself to data-driven resolution. Differences in perspective are embedded in the literature on sex differences. It is well to be transparent about them.
Should Individual Effect Sizes Be Treated Individually or Aggregated?
My more important difference with Hyde involves her insistence on treating sex differences as independent bits and pieces rather than as profiles. When are traits of personality, ability, and social behavior rightly treated independently? When should they be added up? These questions come up all the time in the social and behavioral sciences, and there are no cookbook recipes to go by.
To illustrate, let’s say we’re investigating personality differences and discover that people in Group A (the group could be based on any kind of common membership, not just sex) are somewhat more outgoing on average than people in Group B, with “somewhat” meaning that d = +0.35.
We get to know these groups better and determine that Group A is also somewhat warmer on average than Group B, with d = +0.35. Should we represent the two groups as separated by a mean personality difference of +0.35? Add the two effect sizes and say they are separated by a difference of +0.70? Or something in between?
I say that the answer is something close to +0.35. Outgoing and warm are nearly synonymous. The additional information hasn’t given us reason to think that the two groups of people are much more different than we already knew.
Suppose instead that we determine that Group A is also more emotionally stable than Group B, with d = +0.35. Should we continue to represent the two groups as separated by an average of +0.35? An aggregate of +0.70? Or something in between?
This time, I argue that the answer has to be closer to +0.70. We’re comparing people who are both warmer and more emotionally stable with people who are more aloof and easily upset. The personalities of the two groups are (on average) definitely more different than we knew before.
We continue to learn more about the two groups. We learn that one group is more prudent, the other more happy-go-lucky; one group is more practical, the other more imaginative; and so on. In some cases, the additional traits on which the groups differ are so closely related that the new knowledge adds only a small amount to the difference; in other cases, the new information adds a lot to the degree of their difference. But whether increments are small or large, my view is that individual differences that are conceptually related should routinely be aggregated.
Psychologist Marco Del Giudice, a leading advocate for aggregating sex differences in personality, uses an analogy with the distance between towns. If I tell you that one town is 35 miles west and 35 miles north of another town and ask you the Euclidean distance between the two, it wouldn’t occur to you to take the average of the two and announce that the towns were 35 miles apart. Similarly, it wouldn’t occur to you to add the two and say that the towns are 70 miles apart. You realize that we’re talking about a right triangle and that the hypotenuse is the distance between the two towns. You remember the Pythagorean theorem and know that the distance is therefore the square root of 352 + 352, which works out to about 49.5 miles. If I were then to tell you that the altitude of the two towns differed by 4,000 feet, you would have to recalculate, taking the third dimension of height into account.
I like the analogy in part because the correct answer is so intuitively satisfying: We neither treat the three measures of distance separately nor simply combine the raw measures. Some method of aggregation that falls between averaging and simple addition seems right.
If you still want to average traits or treat them separately, my argument does not compel you to change your mind. I’ve made it through analogy and an appeal to intuition. But you should come to grips with how radical your solution is. If two indicators are involved, averaging cuts the simple sum of the two effect sizes by half. With three indicators, it cuts the simple sum by two-thirds. Suppose 10 indicators are involved. Averaging the results gives you an estimate of the sex difference that is just one-tenth of the estimate you would get by adding up the effect sizes. Doesn’t that seem like too much of a discount? This is a nontechnical way of saying that cognitive repertoires commonly involve multidimensional constructs, and the measure of male-female differences must be multidimensional as well.[17]
In the same way that it is possible to compute the geographical distance separating two towns given two measures of their distance on the cardinal points of the compass, it is possible to compute distance in multidimensional space. The most widely used statistic for expressing multivariate distance is called Mahalanobis D, named after the Indian statistician, Prasanta Mahalanobis, who developed it. The algorithm for calculating D does what I have argued intuition tells us it should, taking correlations into account. Suppose that variables have correlations near zero. D converges on the Euclidean distance. The higher the correlation between variables, the less D is augmented by including them. When a new variable is a linear combination of variables already in the equation, D is not augmented at all.18 The note also gives you references disputing his position (one of them by Hyde) and Del Giudice’s response to them.
In assessing the various arguments for and against, three points need to be kept in mind. First, Mahalanobis D or any other method of aggregation must be used cautiously. In all complex statistical analyses, the validity of the results depends on interpreting the statistic with its limitations in mind.
But that leads to my second point: When I talk about indicators of sex differences being “conceptually related,” I am not appealing to esoteric social science abstractions. To go back to my example, traits like warmth and emotional stability are characteristics with which we’re all familiar from everyday life. We can effortlessly think of them as continua from coldly aloof to gushingly friendly; from rock-solid calm to emotionally volcanic. We’ve had experience with people who have different combinations of the two traits. In the same way, given normal standards of technical care in the application of multidimensional measures of distance and a clear narrative description of the logic for combining traits, aggregated measures of multidimensional distance can enhance our understanding of sex differences.
My third point is that in the real world it is taken for granted that small differences add up. Imagine a tennis match. You know that both players are professionals, but that’s all you know. You have to bet on one of them. You learn that one player is 10 percent taller than the other. That doesn’t give you much to go on; all you need is fractionally better than 1:1 odds to bet on the other guy. But suppose you then learn that the taller player also has 10 percent greater wingspan, 10 percent greater strength, 10 percent more endurance, 10 percent faster foot speed, 10 percent faster serve speed, 10 percent h
igher percentage of first serves, 10 percent faster reaction time, and 10 percent more emotional control. Now what kind of odds do you require to bet on the other guy?
I should add that my position makes virtually no practical difference to the discussions in the next four chapters. Almost all of the effect sizes I report are plain vanilla Cohen’s d. I have given so much space to this topic because I think that treating effect sizes individually or averaging them has underestimated male-female differences. If you are unpersuaded, I will rest my case with the example of sex differences in the human face. Adult female and male faces are distinguished by dozens of tiny morphological differences. But they add up. Consider the following two faces:
Source: Adapted from Rhodes, Jeffery, Watson et al. (2004).
Describing precisely why those two faces are so obviously a female’s on the left and a male’s on the right is daunting. The individual differences are almost imperceptible. But one thing is sure: To average out all those tiny individual differences and conclude that “male and female faces are virtually indistinguishable” would be ridiculous. The estimate of overall sex difference in faces must be expressed as some sort of aggregation. I submit that the same holds true for all sex differences comprised of functionally distinctive but conceptually related traits.
2
Sex Differences in Personality
Proposition #1: Sex differences in personality are consistent worldwide and tend to widen in more gender-egalitarian cultures.
Bimbo. Jock. Feminine. Macho. A great lady. A true gentleman. Males and females have been stereotyping each other from time out of mind, positively and negatively. Almost all of the stereotypes are about personality characteristics that are thought to break along the lines of sex. Some do and some don’t. At the end of the review of the evidence in this chapter, I defy anyone to conclude that either sex has a superior personality profile. They’re just different. Some of the most coherent ways they’re different correspond to the People-Things dimension.
Sex Differences in Psychiatric and Neurological Conditions
The most extreme expressions of personality characteristics manifest themselves as personality disorders. All of them are known to have genetic causes; some are also known to have environmental causes. One thing is certain: Their incidence rates differ markedly across the sexes. In a 2017 review article, neuroscientist Margaret McCarthy and her colleagues summarized the sex imbalance of incidence rates in a table that I present in an abbreviated version below.1
SEX DIFFERENCES IN PERSONALITY DISORDERS
Condition: Childhood onset: Autism spectrum disorder
Sex with greater prevalence: Male
Approximate proportion of cases: 80–90%
Condition: Childhood onset: Conduct/oppositional defiance disorder
Sex with greater prevalence: Male
Approximate proportion of cases: 75%
Condition: Childhood onset: Attention deficit hyperactivity disorder
Sex with greater prevalence: Male
Approximate proportion of cases: 66–75%
Condition: Childhood onset: Schizophrenia
Sex with greater prevalence: Male
Approximate proportion of cases: 60%
Condition: Childhood onset: Dyslexia and/or reading impairment
Sex with greater prevalence: Male
Approximate proportion of cases: 66–75%
Condition: Childhood onset: Stuttering
Sex with greater prevalence: Male
Approximate proportion of cases: 70%
Condition: Childhood onset: Tourette syndrome
Sex with greater prevalence: Male
Approximate proportion of cases: 75–80%
Condition: Adult onset: Major depression
Sex with greater prevalence: Female
Approximate proportion of cases: 66%
Condition: Adult onset: Bipolar II disorder*
Sex with greater prevalence: Female
Approximate proportion of cases: Unspecified
Condition: Adult onset: Generalized anxiety
Sex with greater prevalence: Female
Approximate proportion of cases: 66%
Condition: Adult onset: Panic disorder
Sex with greater prevalence: Female
Approximate proportion of cases: 70%
Condition: Adult onset: Obsessive-compulsive disorder
Sex with greater prevalence: Female
Approximate proportion of cases: 60%
Condition: Adult onset: Post-traumatic stress syndrome
Sex with greater prevalence: Female
Approximate proportion of cases: 66%
Condition: Adult onset: Anorexia nervosa
Sex with greater prevalence: Female
Approximate proportion of cases: 75%
Condition: Adult onset: Bulimia
Sex with greater prevalence: Female
Approximate proportion of cases: 75–80%
Condition: Adult onset: Alcoholism or substance abuse
Sex with greater prevalence: Male
Approximate proportion of cases: Unspecified
Source: Adapted from McCarthy, Nugent, and Lenz (2017): Table 2. The original table includes references.
* Bipolar II is characterized by at least one episode of major depression lasting two or more weeks and at least one hypomanic episode.
At this point, I just want to put the existence of these well-documented and important sex differences on the table.2 Possible biological causes will be discussed in chapter 5.
Sex Differences in Personality Within the Normal Range
Now I turn to adult personality profiles. We know from everyday experience that personality characteristics tend to cluster. The person who is the life of the party tends to enjoy being around other people elsewhere. The person who is a hypochondriac also tends to fret about other things. In the 1940s, psychometricians led by Raymond Cattell began to explore how personality “facets,” the detailed indicators of personality characteristics, clustered into larger constructs—“factors.”[3] Over several years, Cattell and his colleagues developed a model that had 16 factors and a self-report personality test called the Sixteen Personality Factor Questionnaire, labeled 16PF. It is now in its fifth edition and continues to be widely used.
By the 1980s, another personality model had gained wide currency. It is known colloquially as the Big Five model, the label I will use.4 The factor that explains the most variance is neuroticism, which I will relabel emotional stability (see the box below). The other four, in descending order of the variance they explain, are extraversion, openness, agreeableness, and conscientiousness. The first widely accepted test was based on work by Paul Costa and Robert McCrae of the National Institutes of Health. I will refer to it as the Five Factor Model (FFM) inventory.5
NEUROTICISM OR EMOTIONAL STABILITY?
Every personality characteristic has a continuum that goes from one extreme to the other, and neither extreme is desirable.6 For example, agreeableness at one extreme indicates an unquestioningly acquiescent person; at the other extreme, it indicates a reflexively antagonistic person. Four of the Big Five factors have labels that describe a moderately positive position on the continuum. One label, neuroticism, is not only negative but, to most ears, extremely negative. In the technical literature, scholars increasingly use a moderately positive label for this factor, emotional stability. I do so as well.
Other personality models have been developed, but the 16PF and FFM inventories continue to be the ones with the largest databases and the most cross-national databases.7 I focus on three surveys of adults: the U.S. standardization sample of Costa and McCrae’s FFM inventory in 1992 (n = 1,000), hereafter called the Costa study; a 2018 replication using the open-access version of the FFM inventory by psychologists Petri Kajonius and John Johnson (n = 320,128), hereafter called the Kajonius study; and the analysis by psychologists Marco Del Giudice, Tom Booth, and Paul Irwing of the U.S. standardization sample for the f
ifth edition of the 16PF inventory (n = 10,261), hereafter called the Del Giudice study.
Personality Sex Differences in the United States
It is appropriate to begin by emphasizing that on many important personality traits, the differences between men and women are quite small. These trivial differences apply to many characteristics that are sometimes ascribed to men (e.g., “assertive or forceful in expression,” “self-reliant, solitary, resourceful”) and ones that are sometimes ascribed to women (e.g., “open to the inner world of imagination,” “lively, animated, spontaneous”). The full list is given in the note.[8]
Among the traits on which men and women differ, some of the largest effect sizes are consistent with the higher prevalence of depression among women. In the FFM inventory, women experienced more free-floating anxiety than men (d = +0.40 and +0.56 for the Costa and Kajonius studies respectively) and were more vulnerable to stress (d = +0.44 and +0.54). In the 16PF inventory, women were more apprehensive, self-doubting, and worried (d = +0.60 in the Del Giudice study).9
Some of the substantively significant sex differences correspond to traditional stereotypes about feminine sensibility. In the FFM inventory, women were more appreciative of art and beauty than were men (d = +0.34 and +0.33 for the Costa and Kajonius studies respectively), were more open to inner feelings and emotions (d = +0.28 and +0.64), were more modest in playing down their achievements (d = +0.38 and +0.45), and were more reactive, affected by feelings, and easily upset (d = +0.53). In the 16PF inventory, several stereotypical characteristics were combined into one factor, “sensitive, aesthetic, sentimental,” with a whopping d of +2.29.
Human Diversity Page 4