by David Reich
This means that our genomes hold within them a multitude of ancestors. Any person’s genome is derived from 47 stretches of DNA corresponding to the chromosomes transmitted by mother and father plus mitochondrial DNA. One generation back, a person’s genome is derived from about 118 (47 plus 71) stretches of DNA transmitted by his or her parents. Two generations back, the number of ancestral stretches of DNA grows to around 189 (47 plus 71 plus another 71) transmitted by four grandparents. Look even further back in time, and the additional increase in ancestral stretches of DNA every generation is rapidly overtaken by the doubling of ancestors. Ten generations back, for example, the number of ancestral stretches of DNA is around 757 but the number of ancestors is 1,024, guaranteeing that each person has several hundred ancestors from whom he or she has received no DNA whatsoever. Twenty generations in the past, the number of ancestors is almost a thousand times greater than the number of ancestral stretches of DNA in a person’s genome, so it is a certainty that each person has not inherited any DNA from the great majority of his or her actual ancestors.
These calculations mean that a person’s genealogy, as reconstructed from historical records, is not the same as his or her genetic inheritance. The Bible and the chronicles of royal families record who begat whom over dozens of generations. Yet even if the genealogies are accurate, Queen Elizabeth II of England almost certainly inherited no DNA from William of Normandy, who conquered England in 1066 and who is believed to be her ancestor twenty-four generations back in time.21 This does not mean that Queen Elizabeth II did not inherit DNA from ancestors that far back, just that it is expected that only about 1,751 of her 16,777,216 twenty-fourth-degree genealogical ancestors contributed any DNA to her. This is such a small fraction that the only way William could plausibly be her genetic ancestor is if he was her genealogical ancestor in thousands of different lineage paths, which seems unlikely even considering the high level of inbreeding in the British royal family.
Figure 4. The number of ancestors you have doubles every generation back in time. However, the number of stretches of DNA that contributed to you increases by only around seventy-one per generation. This means that if you go back eight or more generations, it is almost certain that you will have some ancestors whose DNA did not get passed down to you. Go back fifteen generations and the probability that any one ancestor contributed directly to your DNA becomes exceedingly small.
Going back deeper in time, a person’s genome gets scattered into more and more ancestral stretches of DNA spread over ever-larger numbers of ancestors. Tracing back fifty thousand years in the past, our genome is scattered into more than one hundred thousand ancestral stretches of DNA, greater than the number of people who lived in any population at that time, so we inherit DNA from nearly everyone in our ancestral population who had a substantial number of offspring at times that remote in the past.
There is a limit, though, to the information that comparison of genome sequences provides about deep time. At each place in the genome, if we trace back our lineages far enough into the past, we reach a point where everyone descends from the same ancestor, beyond which it becomes impossible to obtain any information about deeper time from comparison of the DNA sequences of people living today. From this perspective, the common ancestor at each point in the genome is like a black hole in astrophysics, from which no information about deeper time can escape. For mitochondrial DNA this black hole occurs around 160,000 years ago, the date of “Mitochondrial Eve.” For the great majority of the rest of the genome the black hole occurs between five million and one million years ago, and thus the rest of the genome can provide information about far deeper time than is accessible through analysis of mitochondrial DNA.22 Beyond this, everything goes dark.
The power of tracing this multitude of lineages to reveal the past is extraordinary. In my mind’s eye, when I think of a genome, I view it not as a thing of the present, but as deeply rooted in time, a tapestry of threads consisting of lines of descent and DNA sequences copied from parent to child winding back into the distant past. Tracing back, the threads wind themselves through ever more ancestors, providing information about population size and substructure in each generation. When an African American person is said to have 80 percent West African and 20 percent European ancestry, for example, a statement is being made that about five hundred years ago, prior to the population migrations and mixtures precipitated by European colonialism, 80 percent of the person’s ancestral threads probably resided in West Africa and the remainder probably lived in Europe. But such statements are like still frames in a movie, capturing one point in the past. An equally valid perspective is that one hundred thousand years ago, the vast majority of lineages of African American ancestors, like those of everyone today, were in Africa.
The Story Told by the Multitudes in Our Genomes
In 2001, the human genome was sequenced for the first time—which means that the great majority of its chemical letters were read. About 70 percent of the sequence came from a single individual, an African American,23 but some came from other people. By 2006, companies began selling robots that reduced the cost of reading DNA letters by more than ten thousandfold and soon by one hundred thousandfold, making it economical to map the genomes of many more people. It thus became possible to compare sequences not just from a few isolated locations, such as mitochondrial DNA, but from the whole genome. That made it possible to reconstruct each person’s tens of thousands of ancestral lines of descent. This revolutionized the study of the past. Scientists could gather orders of magnitude more data, and test whether the history of our species suggested by the whole genome was the same as that told by mitochondrial DNA and the Y chromosome.
A 2011 paper by Heng Li and Richard Durbin showed that the idea that a single person’s genome contains information about a multitude of ancestors was not just a theoretical possibility, but a reality. To decipher the deep history of a population from a single person’s DNA, Li and Durbin leveraged the fact that any single person actually carries not one but two genomes: one from his or her father and one from his or her mother.24 Thus it is possible to count the number of mutations separating the genome a person receives from his or her mother and the genome the person receives from his or her father to determine when they shared a common ancestor at each location. By examining the range of dates when these ancestors lived—plotting the ages of one hundred thousand Adams and Eves—Li and Durbin established the size of the ancestral population at different times. In a small population, there is a substantial chance that two randomly chosen genome sequences derive from the same parent genome sequence, because the individuals who carry them share a parent. However, in a large population the chance is far lower. Thus, the times in the past when the population size was low can be identified based on the periods in the past when a disproportionate fraction of lineages have evidence of sharing common ancestors. Walt Whitman, in the poem “Song of Myself,” wrote, “Do I contradict myself? / Very well, then I contradict myself, / (I am large, I contain multitudes).” Whitman could just as well have been talking about the Li and Durbin experiment and its demonstration that a whole population history is contained within a single person as revealed by the multitude of ancestors whose histories are recorded within that person’s genome.
An unanticipated finding of the Li and Durbin study was its evidence that after the separation of non-African and African populations, there was an extended period in the shared history of non-Africans when populations were small, as reflected in evidence for many shared ancestors spread over tens of thousands of years.25 A shared “bottleneck event” among non-Africans—when a small number of ancestors gave rise to a large number of descendants today—was not a new finding. But prior to Li and Durbin’s work, there was no good information about the duration of this event, and it seemed plausible that it could have transpired over just a few generations—for example, a small band of people crossing the Sahara into North Africa, or from Africa into Asia. The Li and Durbin evi
dence of an extended period of small population size was also hard to square with the idea of an unstoppable expansion of modern humans both within and outside Africa around fifty thousand years ago. Our history may not be as simple as the story of a dominant group that was immediately successful wherever it went.
How the Whole-Genome Perspective
Put an End to Simple Explanations
The newfound ability to take a whole-genome view of human biology, made possible by leaps in technology in the last decades, has allowed reconstruction of population history in far more detail than had been previously possible. In doing so it revealed that the simple picture from mitochondrial DNA, and the just-so stories about one or a few changes propelling the Later Stone Age and Upper Paleolithic transitions when recognizably modern human behavior became widespread as reflected in archaeological sites across Africa and Eurasia, are no longer tenable.
In 2016, my colleagues and I used an adaptation of the Li and Durbin method26 to compare populations from around the world to the earliest branching modern human lineage that has contributed a large proportion of the ancestry of a population living today: the one that contributed the lion’s share of ancestry to the San hunter-gatherers of southern Africa. Our study,27 like most others,28 found that the separation had begun by around two hundred thousand years ago and was mostly complete by more than one hundred thousand years ago. The evidence for this is that the density of mutations separating San genomes from non-San genomes is uniformly high, implying few shared ancestors between San and non-San in the last hundred thousand years. “Pygmy” groups from Central African forests harbor ancestry that is arguably just as distinctive. The extremely ancient isolation of some pairs of human populations from each other conflicts with the idea that a single mutation essential to distinctively modern human behavior occurred shortly before the Upper Paleolithic and Later Stone Age. A key change essential to modern human behavior in this time frame would be expected to be at high frequency in some human populations today—those that descend from the population in which the mutation occurred—and absent or very rare in others. But this seems hard to reconcile with the fact that all people today are capable of mastering conceptual language and innovating their culture in a way that is a hallmark of modern humans.
A second problem with the notion of a genetic switch became apparent when we applied the Li and Durbin method to search for places where all the genomes we analyzed shared a common ancestor in the period before the Upper Paleolithic and Later Stone Age. At FOXP2—the gene that seemed the best candidate for a switch based on previous studies—we found that the common ancestor of everyone living today (that is, the person in whom modern humanity’s shared copy of FOXP2 last occurred), lived more than one million years ago.29
Expanding our analysis to the whole genome, we could not find any location—apart from mitochondrial DNA and the Y chromosome—where all people living today share a common ancestor less than about 320,000 years ago. This is a far longer time scale than the one required by Klein’s hypothesis. If Klein was right, it would be expected that there would be places in the genome, beyond mitochondrial DNA and the Y chromosome, where almost everyone shares a common ancestor within the last hundred thousand years. But these do not in fact seem to exist.
Our results do not completely rule out the hypothesis of a single critical genetic change. There is a small fraction of the genome that contains complicated sequences that are difficult to study and that was not included in our survey. But the key change, if it exists, is running out of places to hide. The time scale of human genetic innovation and population differentiation is also far longer than mitochondrial DNA and other genetic data suggested prior to the genome revolution. If we are going to try to search the genome for clues to what makes modern humans distinctive, it is likely that we cannot look to explanations involving one or a few changes.
The whole-genome approaches that became possible after the technological revolution of the 2000s also soon made it clear that natural selection was not likely to take the simple form of changes in a small number of genes, as Klein had imagined. When the first whole-genome datasets were published, many geneticists (myself included) developed methods that scoured the genome for mutations that were affected by natural selection.30 We were searching for the “low-hanging fruit”—instances in which natural selection had operated strongly on a few mutations. Examples of such low-hanging fruit include the mutations allowing people to digest cow’s milk into adulthood, or mutations that cause darkening or lightening of skin to adapt to local climates, or mutations that bequeath resistance to the infectious disease malaria. As a community, we have been successful in identifying selection on mutations like these because they have risen rapidly from low to high frequency, resulting in a large number of people today sharing a recent ancestor or striking differences in mutation frequency between two otherwise similar populations. Events like these leave great scars on patterns of genome variation that can be detected without too much trouble.
Excitement about this bonanza was tempered by work led by Molly Przeworski, who studied the types of patterns that natural selection is likely to leave on the genome as a whole. A 2006 study by Przeworski and her colleagues showed that genome scans of present-day human genetic variation will miss most instances of natural selection because they simply will not have the statistical power needed to detect it, and that scans of this type will also have more power to detect some types of selection than others.31 A study she led in 2011 then showed that only a small fraction of evolution in humans has likely involved intense natural selection for advantageous mutations that had not previously been present in the population.32 Thus, intense and easily detectable episodes of natural selection such as those that have facilitated the digestion of cow’s milk into adulthood are an exception.33
So what has been the dominant mode of natural selection in humans if not selection on newly arising single mutation changes that then rocket up to high frequency? An important clue comes from the study of height. In 2010, medical geneticists analyzed the genomes of around 180,000 people with measured heights, and found 180 independent genetic changes that are more common in shorter people. This means that these changes, or ones nearby on the genome, contribute directly to reduced height. In 2012, a second study showed that at the 180 changes, southern Europeans tend to have the versions that reduce height, and that this pattern is so pronounced that the only possible explanation is natural selection—likely for increased height in northern Europeans or decreased height in southern Europeans since the two lineages separated.34 In 2015, an ancient DNA study led by Iain Mathieson in my laboratory revealed more about this process. We assembled DNA data from the bones and teeth of 230 ancient Europeans and analyzed the data to suggest that these patterns reflected natural selection for mutations that decreased height in farmers in southern Europe after eight thousand years ago, or increased height in ancestors of northern Europeans who lived in the eastern European steppe lands before five thousand years ago.35 The advantages that accrued to shorter people in southern Europe, or to taller people in far eastern Europe, must have increased the number of their surviving children, which had the effect of systematically changing the frequencies of these mutations until a new average height was achieved.
Since the discoveries about height, other scientists have documented additional examples of natural selection on other complex human traits. A 2016 study analyzed the genomes of several thousand present-day Britons and found natural selection for increased height, blonder hair, bluer eyes, larger infant head size, larger female hip size, later growth spurt in males, and later age of puberty in females.36
These examples demonstrate that by leveraging the power of the whole genome to examine thousands of independent positions in the genome simultaneously, it is possible to get beyond the barrier that Molly Przeworski had identified—“Przeworski’s Limit”—by taking advantage of information that we now have about a large number of genetic variations at many
locations in the genome that have similar biological effects. We have such information from “genome-wide association studies,” which since 2005 have collected data from more than one million people with a variety of measured traits, thereby identifying more than ten thousand individual mutations that occur at significantly elevated frequency in people with particular traits, including height.37 The value of genome-wide association studies for understanding human health and disease has been contentious because the specific mutation changes that these studies have identified typically have such small effects that their results are hardly useful for predicting who gets a disease and who does not.38 But what is often overlooked is that genome-wide association studies have provided a powerful resource for investigating human evolutionary change over time. By testing whether the mutations identified by genome-wide association studies as affecting particular biological traits have all tended to shift in frequency in the same direction, we can obtain evidence of natural selection for specific biological traits.
As genome-wide association studies proceed, they are beginning to investigate human variation in cognitive and behavioral traits,39 and studies like these—such as the ones for height—will make it possible to explore whether the shift to behavioral modernity among our ancestors was driven by natural selection. This means that there is new hope for providing genetic insight into the mystery that puzzled Klein—the great change in human behavior suggested by the archaeological records of the Upper Paleolithic and Later Stone Age.