THE PIONEERING ANCESTRAL POPULATION DATABASES
You will be seeing references to these databases in the rest of Part II:
The HGDP-CEPH database. Predating the sequencing of the genome, this database consists of lymphoblastoid cell lines from 1,050 individuals in 52 world populations. It was assembled by the Human Genome Diversity Project at Stanford University under the direction of Luigi Luca Cavalli-Sforza.
The Perlegen database: In 2005, researchers at Perlegen Sciences genotyped almost 1.6 million SNPs in 71 Americans of European, African, and Asian ancestry.
The HapMap Project: The first large database of ancestral populations, Phase 1 of the HapMap Project, released in 2005, genotyped a million SNPs in 269 individuals drawn from four populations—Yoruban in Nigeria, Japanese in Tokyo, Han in Beijing, and Utah residents of northern and western European ancestry.
1000 Genomes Project: This has been the most widely used database for analyzing ancestral populations since the release of Phase 1 in 2012. It is described in detail here.
Among the one-third of the sites that meet the technical definition of SNP (the minor allele frequency is at least .01), almost all will be present in at least one colonist, but their existence will be precarious because they are carried by only a few. Thus the genetic distinctiveness of the colonists will increase with each generation because, in each generation, thousands of SNPs will cease to vary because so few members of the crew carried the unusual variant and the coin flips went against those SNPs.
In addition to losing diversity through SNPs that have ceased to vary (and therefore may no longer be called SNPs), genetic drift will produce different patterns among the SNPs that remain. Suppose we split up our crew into two groups of 50. At the time of the split, let’s say that a given SNP has a frequency of 50 percent in both groups. Twenty generations later, suppose it has drifted downward to a percentage of 45 percent in one of the groups and upward to 55 percent in the other. It has been a completely random event without adaptive implications in both cases, but the two groups have nonetheless become genetically distinctive with regard to that particular SNP.
I have described the process in terms of a spaceship crew centuries in the future, but the relevant population genetics theory has been established for decades. In 1943, Sewall Wright explored the mathematics of genetic transmission in a population distributed uniformly over a large area.4 It had long been accepted that populations separated by geographical barriers such as mountains or oceans would be genetically distinctive (the “island model”). Wright’s equations demonstrated that even within an area in which a population is evenly spread without geographic barriers, the parents of any given individual are drawn from a small surrounding region (the “isolation by distance model”). In 1948, Gustave Malécot demonstrated mathematically that population differences in gene frequencies may be expected to increase as a function of geographic distance.5 In 1964, population geneticists Motoo Kimura and George Weiss integrated these findings into what they labeled the “stepping stone model of population structure.”6 This model postulated that a population expands outward from a single geographic center and that occasionally a band splits off from the larger group. The result is a series of stepwise increases in genetic drift and decreases in genetic diversity within each band. The process came to be called a “serial founder effect.”
The model also predicted that the cumulative magnitude of genetic distinctiveness would tend to be associated with geographic distance from the original center, because these migrations would tend to be driven by the subpopulation’s need to find unoccupied territory. It may seem odd that such a need existed, given how few humans were around during the Pleistocene, but hunter-gatherers take up a lot of space—usually about 5,000 acres per person, although, depending on local conditions, a band of just 25 could require more than 1,000 square miles.7 By moving away from occupied territory, humans would also usually be moving farther geographically from the original center.
How the Earth Was Peopled
This body of theoretical work became especially relevant as paleontologists and then geneticists found compelling evidence for what is known as the Out-of-Africa explanation of human expansion. It started out simple. It is now exceedingly complex and becoming more so.
Around 6 million years ago, the first hominins diverged from chimpanzees, becoming fully bipedal sometime more than 4 million years ago.8 Homo habilis, who was bipedal and apparently used stone tools, appeared about 2.5 million years ago. About 2 million years ago came our likely direct ancestor, Homo erectus. Hominins first expanded out of Africa around 1.8 to 2.1 million years ago and eventually spread throughout Eurasia.9
LABELS FOR OUR ANCESTORS
Hominins: Refers to all branches of the human family, modern and extinct. Another word, hominid, formerly had this meaning, but in contemporary usage, hominid refers to all modern and extinct great apes, including us. Mnemonic for keeping them straight: Human and hominin both end with an n.
Homo erectus: Simplifying, the hominin that immediately preceded Homo sapiens.[10]
Homo sapiens: Us. All humans on planet Earth.
Anatomically modern humans: This term, abbreviated AMH in the technical literature, refers to archaic Homo sapiens who had globular brain cases and other physiological traits of Homo sapiens but did not leave behind substantial evidence of cultural accouterments (art, burials, ornament, musical instruments).11
Until the late 1980s, three theories competed to explain where and how Homo erectus became Homo sapiens.12 The oldest of these was Franz Weidenreich’s multiregional hypothesis, dating back to the 1940s, arguing that evolution from Homo erectus to anatomically modern humans happened contemporaneously throughout Africa and Eurasia but with continual gene flow across regions during the process.13 In the early 1960s, Carleton Coon countered with the “candelabra hypothesis,” arguing that anatomically modern humans had evolved along separate lines in Africa, Europe, and Asia.14 The third theory was the Out-of-Africa hypothesis. It emerged in the 1980s as paleontologists realized that the oldest fossil remains of anatomically modern humans were always being found in eastern Africa, not in Europe or Asia.
Genetics entered the debate in 1987 when geneticist Rebecca Cann used mitochondrial DNA, which is inherited solely from the mother, to argue that all of today’s humans are descended from a single female who lived sometime between 99,000 and 148,000 years ago.15 A year later, paleontologists Christopher Stringer and Peter Andrews combined the genetic evidence with the growing paleontological record to make the case that anatomically modern humans had evolved exclusively within Africa and only thereafter expanded to the rest of the world.[16]
By the end of the 1980s, the circumstantial evidence for the Out-of-Africa model had won over a majority of the scientists working on the problem, but definitive evidence required more detailed access to the genome. In 1991, population geneticist Luigi Luca Cavalli-Sforza of Stanford University initiated the Human Genome Diversity Project (HGDP). Geneticists around the world had been collecting blood samples and other data from different populations. Cavalli-Sforza’s idea was to assemble and augment these disparate sources of information, combining them into an integrated database. Eventually, the project brought together cultured lymphoblastoid cell lines from 1,050 individuals in 52 world populations. Preliminary results were included in Cavalli-Sforza’s magnum opus, 1,088 pages long, The History and Geography of Human Genes, published in 1994.17 The HGDP data broadly substantiated the theory that the human dispersal had indeed consisted of radiating expansions from a single center somewhere in Africa.
The finishing touch came in 2005 when scholars from Stanford, the University of Illinois, and the University of Michigan applied the newly acquired data from the sequenced human genome to rigorous genetic tests of two key questions: Was Africa the origin of the human dispersal? Was the peopling of the globe characterized by the “serial founder effect” (the loss of genetic diversity that occurs when a subpopulation br
eaks off from the main population)? The analyses were excruciatingly thorough. To take just one example, the authors performed regressions of genetic distance on geographic distance using each of 4,210 potential centers for human dispersal. By the end of their work, they could conclude confidently that “no origin outside Africa had the explanatory power of an origin anywhere in Africa” and that the geographic patterns were “consistent with a model of a serial founder effect starting at a single origin.”18
Since 2005, advances in genetics and new paleontological evidence have transformed the state of knowledge about the dispersal out of Africa and have also identified a number of new questions.19 Some uncertainty has emerged about exactly where anatomically modern humans arose. The consensus answer has long been East Africa, but now it is thought that other regions in the continent may have played a role. Recent fossil evidence for anatomically modern humans comes from Morocco and dates to 315,000 years ago, more than 100,000 years earlier than the previous oldest fossils.20 A team of physical anthropologists, archaeologists, and geneticists have argued that morphologically and technologically varied populations of Homo sapiens lived throughout Africa, supporting a view of a “a highly structured African prehistory that should be considered in human evolutionary inferences, prompting new interpretations, questions, and interdisciplinary research directions.”21 But the core tenet of the Out-of-Africa theory—that hominins became Homo sapiens exclusively in Africa—remains uncontested.
The immediate consequence of the exodus was the spaceship effect. Within Africa, interbreeding continued even across substantial geographic distance. The genetic diversity that had already accumulated within Africa was largely conserved. The few hundred people who left Africa carried with them only a fraction of the total genetic diversity that existed there. In that sense, subsequent generations were guaranteed to be distinct from those who remained in Africa, if only because their descendants could not possibly carry the full range of traits that still occurred among the peoples who remained.
Theories about the exodus from Africa have their own uncertainties. More than one dispersal occurred, but by what routes? When? The archaeological record combined with recent paleogenomic evidence strongly suggests that an early wave or waves traveled from northern Egypt across the Sinai Peninsula and were probably present in the Levant around 200,000 years ago—much earlier than had previously been thought.22
Tens of thousands of years later, a southern exodus occurred, probably through the Bab-el-Mandeb Strait at the mouth of the Red Sea.23 The date of the exodus was formerly put at about 60,000 years ago, but emerging evidence points to an earlier date, perhaps as early as 120,000–130,000 years ago.24
Until 2016, the evidence for multiple dispersals led naturally to the assumption that at least some members of more than one wave survived. A plausible scenario was that an initial southern wave peopled Southeast Asia and “Sahul,” the name that has been given to the Pleistocene landmass that included today’s Australia, New Guinea, and Tasmania. It is now usually called Oceania by students of human populations. It was thought that a later northern wave spread through the Levant and peopled Europe, Central Asia, East Asia, and eventually the Americas via the Siberian bridge.25
In 2016, a new whole-genome study based on 300 genomes from 142 diverse populations provided evidence for a one-wave scenario, indicating that just one band of anatomically modern emigrants from Africa has descendants among today’s humans.26 The individuals in the study represented a larger and more globally representative set of populations than ever before, with genomes sequenced at a more precise level than ever before. But, as usual, there were complications. The genomes of Papuans in the study gave signs that about 2 percent of their genomes might have come from an earlier population. That’s not much, but it suggests something more complicated than a single band of emigrants.
One thing is sure: Homo sapiens was not spreading into an unpopulated continent. Whenever they reached Eurasia, it is now accepted that at least two archaic hominins, the Neanderthals and Denisovans, were already in residence and that the anatomically modern newcomers interbred with both groups.27 The admixture with the Neanderthals was until recently dated to 50,000–65,000 years ago.28 Other evidence now suggests that introgression with Neanderthals began earlier, with one study finding that it could date as far back as 270,000 years, which opens up still another set of possibilities.29 It affected the genomes of both modern Europeans and East Asians.30 Admixture with Denisovans is found in proportions as high as 5 percent in Papuans and the Melanesians and Australian Aboriginals, and in far lower proportions in South, Southeast, and East Asians.31
The complications don’t stop there. There are the Hobbits to deal with—fossils from the three-foot hominins found on the Indonesian island of Flores.32 There are the apparent migrations from Europe back into North Africa.33 There’s the unexpected discovery that America was not peopled by a single migration across the Bering Strait but more likely by four separate prehistoric migrations, one of which is a mysterious population Y that has descendants in both the Amazon and in Australasia.34 In David Reich’s words, “the evidence for many lineages and admixtures should have the effect of shaking our confidence in what to many people is now an unquestioned assumption that Africa has been the epicenter of all major events in human evolution.”35
Reich’s 2018 book, Who We Are and How We Got Here, also recounts one of the most useful discoveries in modifying the traditional view of race. It is not the case that Europe was settled by emigrants from Africa who then adapted over tens of thousands of years to become the peoples we now identify as European whites. Europe was repeopled several times, as groups from various points in Central Asia and the Middle East displaced the existing populations. What we think of as European whites are indeed an amalgam. So are today’s East Asians, South Asians, sub-Saharan Africans, and Amerindians. Ancestral populations did not evolve quietly in isolation. Genetic ancestry is endlessly fluid and dynamic.
The study of the peopling of the Earth has both powerful new analytic methods and a mother lode of ancient DNA data that has barely been tapped. I won’t try to give a sense of the mainstream on many open questions, because everything is in such flux. Some accessible overviews are given in the note.[36] I should add that I have had to revise this account several times during the time it took to write Human Diversity because of new discoveries. There’s no reason to doubt that additional discoveries are on the way over the next several years. Stay tuned.
The Correspondence Between Genetic Differentiation and Self-Identified Race and Ethnicity
The maps in The History and Geography of Human Genes revealed for the first time that genetic differentiation of populations showed a continental pattern. This should not have been surprising—if humans began in Africa, population genetics theory predicted that the differentiation would increase along with geographic distance from Africa. But it was nonetheless jarring to see how closely the clusters corresponded with traditional definitions of races at the continental level. The five continents in question were Africa, Europe, East Asia, the Americas, and Oceania.
The First Cluster Analyses of Genetic Distinctiveness Across Populations
During the 1990s and 2000s, Stanford’s Human Genome Diversity Project produced a series of cluster analyses that successively expanded on the patterns reported in The History and Geography of Human Genes. The first was published the same year, 1994. Cavalli-Sforza collaborated with other scholars—Anne Bowcock was the first author—to analyze a particular type of microsatellite at 30 places on the genome, using a sample covering 14 populations. They used cluster analysis to explore the ways in which the 14 populations fell into groups.
FOUR THINGS TO REMEMBER ABOUT CLUSTER ANALYSIS
Many kinds of statistical cluster analysis are routinely used by disciplines in both the hard and the soft sciences. They all have the same generic purpose: to see whether the members of a sample can appropriately be parsed into groups. The choice
of how many groups is specified by the analyst. The usual procedure is to instruct the statistical software to produce K clusters, beginning with K = 2 and repeating it for incremental values of K as long as the clusters being produced continue to be informative.
Geneticists use a variety of statistical techniques to assess clustering. They fall into two broad categories: distance-based methods and model-based methods.37 The ones I will be discussing are all distance-based, bringing no preconceptions to the analysis. The statistical theory and the computational algorithms for cluster analysis are complex, and trying to describe them here would be overkill.38 Just remember four things about all of the methods that pass methodological muster:
1. A distance-based cluster analysis does not artificially force clusters on the basis of some a priori categorization. The software is trying to find the best statistical fit for the raw data; that’s all.
2. The software will dutifully identify whatever number of “clusters” it is told to produce, but the output of the software also usually makes it easy for the investigators to see that the results aren’t really clusters in any substantive sense.
3. Cluster analysis is exploratory. It is standard procedure for the investigators to run the cluster analysis several times, specifying incremental numbers of clusters and asking what differences among the subjects correspond to the statistical clusters.
4. When dealing with human populations, the clusters do not define “racial purity.” That an individual falls into a single cluster with no admixture indicates statistical coherence given the value of K that happens to be in use for that run. Depending on the number of polymorphic sites in the analysis and the value of K, an individual can fall into a single cluster with no admixture with one value of K and yet show a membership in more than one cluster with another value of K. The only consistent aspect of cluster analyses of human populations is that the clusters do fall along geographical lines—which is not a product of the software but is consistent with the population genetics theory that antedated the tools for conducting cluster analyses.
Human Diversity Page 18