Book Read Free

Out of Eden: The Peopling of the World

Page 29

by Oppenheimer, Stephen


  The language story

  Before I discuss those possible routes and what the genes can tell us about them, I would like to bring in another discipline that has always offered promises of great insight into the past: language history. Sadly, academic temperament has again helped to inject more confusion than clarity.

  Comparative or historical linguistics has as hoary a role in speculations on the first Americans as does archaeology. In each case the nation’s most revered founding father and well-meaning polymath played a part. In 1784, Thomas Jefferson directed the controlled excavation of an ancient mound in Virginia, the first scientific excavation in the history of American archaeology. Four years earlier, he had begun pioneering efforts to collect standardized vocabulary lists for Native Americans. He wanted to trace their origins through comparative linguistics. By 1809 he had several dozen lists, but they were stolen and almost totally destroyed while his possessions were being transported. Jefferson recovered the surviving fragments and sent them to the American Philosophical Society. America thus had a head start on most nations in gazing through this promising yet opaque window onto our past.23

  What was it Jefferson wanted to do with language? His idea itself was simple in concept but well nigh impossible to apply in reality. It goes like this: over time, languages split and branch. If this represents expanding peoples splitting into bands, then we should be able to date the splits and place them on a map. This should then give us a family tree of migrations. For example, European languages such as German, French, Spanish, and English all come from a common Indo-European stock. By comparing them using certain rules, linguists can show that they have branched progressively from that stock via dead ancestors such as Latin and proto-Germanic. If the whole European language tree could be reconstructed – which, though there are certain areas of contention, is possible – the hope is that it might then be possible to trace geographic population splits.

  So, if we then lay out the language tree on the map, we find, for example, that French and Spanish come from Latin and, before that, from proto-Italic. English and German came from some proto-west-German root. All the languages from this European family eventually join up with a proto-Indo-European root thousands of years ago. Now one can start drawing big arrows on the map from A to B and C, from C to D, and so on, and write the history of European colonization as told by the languages.

  The trouble is that it does not work quite like that. Languages do not change only as a result of population splits and random change. For example, the roughly 15 per cent of English that is not Germanic comes from French. That French vocabulary was imported by a very small number of Norman nobles a thousand years ago, after the Norman conquest – a phenomenon called language borrowing. The French intrusion into English came about via the dominance of a small Norman elite, not through massive Norman migration. French is an Italic language, but no one would suggest on that basis that the French are all direct descendants of Roman invaders. The French language changed from Celtic to a Romance language under the influence of the Roman Empire. That is called language shift. Language borrowing and shift can take place with only a minimal movement of people, which spoils simplistic models of language migrating synchronously with people.

  Of course, Europe might not provide the best linguistic comparison with the USA. It has been occupied by modern humans for 45,000 years or more, and there has been a lot of complex internal population movement. But there are a few good examples of people moving and taking only their own language with them in such a way that the language tree can trace the migration history and indicate the source of the migration. These good examples are almost always of migrations into previously unoccupied territory, such as the Polynesians’ spread through the empty islands of the deep Pacific. The small subfamily of Polynesian languages really does look like a tree and does recapitulate their spread as tracked by the archaeological record. What is more, the genetic picture fits the linguistic trail very well within Polynesia.24

  The colonization of the Americas could actually be regarded as similar to the conquest of the Pacific, in that people would have been moving into a huge virgin territory – a New World – and spreading out and separating like rays from a star. The trouble is that the Americas were colonized a long time before Polynesia, which was occupied progressively only between 800 and 3,500 years ago.

  Great time depth has critical effects on language change and limits the possibility of reconstruction. Not only is it extremely difficult to date the splits in languages, but also most linguists feel that because of the inevitable decay in detectable relationships between words, language families cannot be reconstructed or traced further back than about 6,000–8,000 years.25 This is only slightly over half the time since Clovis, so reconstructing a unified New World language tree back to its base is a major problem. If Native American language families can be traced back only 7,000 years, their tree will be missing all its roots and lower branches: we would be looking at a mess of prunings, with little chance of fitting them together as a tree. And indeed this is the case with North, Central, and South America, which are characterized by many language families, over a hundred in all, consisting of about 1,200 languages. To most linguists, these cannot be drawn together into any semblance of a single tree, and even a few trees would be ambitious. Attempting to find out how many individual language branches entered America on this basis would thus be an educated guess at best. That has not stopped some people from trying.

  The Australian linguist Robert Dixon has estimated that about a dozen separate groups speaking different languages entered the Americas between about 12,000 and 20,000 years ago. American linguist Johanna Nichols, known for her deep-time analysis of world language change, reckons that about 35,000 years of occupation of the New World, based on multiple separate original entries from Asia into America, is necessary to explain the present diversity of American language families. The English linguist Daniel Nettle, in contrast, sees the Americas as having reached their climax of family diversity within the last, say, 12,000 years.26

  There is a big problem in using numbers of language families to estimate time depth. This is because, apart from the lack of agreement on time calibration, not all historical linguists are agreed on how to define a family of languages. The result is that numbers of families vary from linguist to linguist and country to country. Historical linguists can generally be divided into ‘splitters’ and ‘lumpers’, the former generating large numbers of families of a few languages, and the latter favouring small numbers of super-families containing large numbers of languages. At one extreme is the veteran American linguist Joseph Greenberg, who has famously gone much further than most other American linguists find acceptable by claiming that he can divide all Native American languages into three founder groups. At the other extreme we find classifications with over 160 families. The majority of American linguists tend to the higher rather than the lower figure.

  In calculating the overall age of American languages (mentioned above), Johanna Nichols has estimated that there are 167 American language ‘stocks’ (groups of languages that can be reconstructed or related back to a common node or branching point) using the strict rules of comparative linguistics. Nichols’ estimates of the number of American language stocks are intended for comparison with language diversity in other countries. She argues that in a region such as a continent or subcontinent which is isolated from outside influences, the number of stocks increases as a simple function of time.27

  There are difficulties with all these analyses based on subclassifications. First, there is no fully agreed approach to reducing numbers of languages to stocks or families, and even Nichols’ definition of ‘stock’ is likely to be subjective in its application. Second, the numbers of individual languages per stock varies from country to country and continent to continent, for reasons that are themselves contentious.28

  Still, whichever way one looks at it, in terms of stocks or languages, South America has far gr
eater linguistic diversity than North America. From a statistical point of view, when exploring unknown mathematical relationships it is generally safer to use the simplest raw measurement for analysis rather than some derived value, which may carry further unknowns. The raw data in this instance are the numbers of languages rather than numbers of stocks.

  I have taken the liberty of expressing the data used by Nichols and Nettle as a simple ‘numbers and time’ graph (Figure 7.2). I have plotted numbers of the world’s languages per region (grouped by subcontinent/continent, as in Nichols’ original dataset) according to estimated dates of occupation, using an intermediate dummy average age of 16,000 years for each of North, Central, and South America. The result is rather straightforward: as Nichols found, there is a simple straight-line relationship between age of occupation and numbers of languages in each region. The exception is Australia, which has always been recognized to have a lower than expected diversity for its age of occupation, being dominated by one language family. This relationship is so clear that we can put Nichols’ regions of North, Central, and South America back into the equation and predict their respective approximate ages of occupation. The result is interesting, almost a caricature, because the crude linguistic age in North America comes out at a Clovis-like 13,000 years, in Central America at a glacial 20,500 years, and in South America at an Early Upper Palaeolithic 32,500 years. We should not put much store in the accuracy of these figures, but they do provide us with a clear ranking in age from south to north. Such a ranking is consistent with the Americas having been colonized before the last ice age. North America would have been largely depopulated during the ice age and would therefore show the effects of re-expansion after the LGM, with fewer languages and an inferred date coincident with Clovis. South America, largely unaffected by the ice age, would have continued to generate new languages right through, thus looking its age. Central America, receiving migrations from North America during the re-expansion, would be intermediate in diversity between the other two.29

  Figure 7.2 Plot of regional language numbers in the Americas and worldwide. Note how the points for the Americas are as predicted by regression equation. The two outliers can be explained on basis that Australia has only one main family, and South and Southeast Asia should really have separate entries but are shown together as in Nichols’ original data-set.26–28

  Since South America must have been colonized through the narrow bottleneck of Panama from North America (see below), the South is effectively yet another New World. So a first entry into South America before the last glaciation seems to be the simplest and most obvious interpretation of the south appearing to be linguistically older than the north. The low language diversity in North America, particularly the far North, may be a consequence of depopulation at the LGM, followed by re-expansion of a few languages during the big melt. As we shall see, this re-expansion model has resonance in the genetics.

  A northern enclave

  While very few American linguists recognize Greenberg’s largest American language family, Amerind, which he claimed encompasses 97 per cent of all American languages, there is little dissent about the other two groups which are found in the far north of America, namely Na-Dene and Inuit-Aleut. Thereby hangs a long-running tale. In 1986 Greenberg got together with a geneticist and a specialist on dental variation. The three of them published a theory which came to be known as the ‘Greenberg hypothesis’,30 which was that the synthesis of the three disciplines told of three separate migrations into the New World, distinct in dental morphology, genes, and language. The first of these migrations carried the ancestors of those Native Americans who speak the huge number of languages Greenberg classified as Amerind. The second, Na-Dene, carried a conglomerate of north-western and coastal languages such as Athapascan, Haida, and Tlingit, while the third carried the languages of the Aleutian islands and the Arctic – Inuit and Aleut. The Greenberg synthesis was conservative as far as dates were concerned, suggesting that these three linguistic groups arrived respectively 11,000, 9,000, and 4,000 years ago.

  Greenberg’s neat, simplistic reductionism appealed at the time to geneticists thinking and working on the peopling of the Americas. Not so to most American linguists, who long before had rejected almost unanimously Greenberg’s ‘lumping’ methods. Some linguists argued that the diversity of American languages and their stocks means separate migrations, while the more cautious have just disputed the integrity of the Amerind group and any conclusions based on the assumption of its reality.31

  How many founding genetic lines? How many migrations?

  The gulf between linguists and geneticists has widened rather than narrowed since the late 1980s as more has been learnt about the genes carried by Native Americans. Rather than accept that the Americas were colonized by several migrations, geneticists have tended to reduce Greenberg’s magic number of three to two, then to just one migration. They also parted company with both Greenberg and the archaeologists on the dating of the first migration, initially coming up with estimates of as long ago as 50,000 years, which had even some of the pre-Clovis archaeologists and ‘deep-time’ linguists gasping. A multidisciplinary synthesis is difficult enough if the disciplines are singing different songs, but if they are in different auditoriums . . .

  We need to go look at some of the developments in Native American genetics since 1990 to understand how this divergence of views came about. In 1991, New Zealand geneticist, the late Ryk Ward, sequenced a small number of mitochondrial DNA types from the Nuu-Chah-Nulth (Amerind-speaking peoples from the American north-west) and identified four clusters or lines of sequences. Ward calculated that they came together as one line between 41,000–78,000 years ago. With foresight, he correctly inferred that this original branch point must have been long before in Asia. This meant that multiple lines must have entered America. Ward’s interpretation of four founding lineages started geneticists speculating on numbers of migration events, which they continue to do to this day.32

  In 1993, the Japanese geneticist Santoshi Horai argued that four lineages could mean four migrations.33 This symmetrical ‘one line equals one migration’ view has been regarded by some as too simplistic an interpretation of founder effects and American genetic diversity, but that is not the last word on the matter. By this time, however, other geneticists were already suggesting that even three migrations was one too many.

  Genetic geography and the Americas

  Their early mtDNA results suggested to American and Italian geneticists Douglas Wallace and Antonio Torroni and colleagues that, while the Amerinds had arrived around 20,000 years ago, the Na-Dene speakers of the north-west coast and Alaska arrived later, around 6,000–10,000 years ago. The next year (1993), Torroni and Wallace further clarified the mtDNA types that had been present in the colonization of America. They performed a high-resolution analysis on a sample of 527 Native Americans from twenty-four ethnic groups throughout America (excluding only Inuit), 404 Siberians from ten ethnic groups, and 106 other East Asians.34

  This was the first attempt ever at drawing up formal ‘phylogeographic’ rules to identify original ‘founder genetic types’. Torroni and Wallace argued that:

  1 True founding mtDNA types should be at the root of their own particular branch, because all subsequent types in that branch would have originated from them.

  2 It should still be possible to detect the founding mtDNA types in the region that was the source of migration, in this case East Asia.

  3 Subsequent or daughter mtDNA types in a founding American cluster should be unique to America and should not be found in Asia.

  Torroni and Wallace identified four main founder clusters throughout Amerind populations, each of which seemed to have been founded by a single mtDNA type. These clans (strictly, haplogroups), designated A–D, overlapped with three of Ward’s groups. Just one of these clusters, A, was found in the northern Na-Dene speakers. Also, the identifying mutations of the founder types were the same for the sister clusters fo
und on the other side of the Bering Strait, in East Asia. (In Chapters 4 and 5 we came across these four groups in Asia – but more of that later.) Although the Americans and East Asians share these four genetic groups in general, the specific daughter American types of the four founders were not found in Asia. In other words, they had probably evolved further in America. These findings were consistent with the three rules given above, and meant that the four clusters had come across from Asia as founders already possessing their A, B, C, and D identities.35 (Figure 7.3)

  Torroni and Wallace also argued that mtDNA sub-types were specific to and typical of particular Native American groups. This suggests that tribal isolation had begun early, since when gene-flow between tribes has been limited. The four groups could all be found in Siberia, except for Group B, suggesting that this region might be the original source, or at least that eastern Siberia and America had a common source.

  Figure 7.3 Suggested pre-glacial entry of mtDNA founder lines into North America. The genetic evidence suggests entry of all five founders via Beringia during this period. I have suggested dual coastal/corridor routes for groups X and B, although B may only have been coastal.36–44

  With the likely founder types identified, it was then possible to use the molecular clock to estimate the founding date for each founder cluster. There were surprising results: Groups A, C, and D appeared to be very old (20,000–41,000 years old in America), certainly much older than Clovis, while B appeared to be nearer the age of Clovis. At the time, Torroni and Wallace took these results to mean an early pre-Clovis entry of Amerinds, with a possible later entry of Group B. They suggested a distinct origin for the Na-Dene and the Inuit-Aleut. Although they were careful this time not to specify an exact number of migrations, they gave the impression that there were three.36

 

‹ Prev