search this blog

Loading...

Wednesday, July 29, 2015

The ancient DNA case against the Anatolian hypothesis


In the debate over the location of the Proto-Indo-European urheimat, Colin Renfrew's Anatolian hypothesis is usually mentioned as the most viable option to the steppe or Kurgan hypothesis. But probably not for very much longer.

Below is a Principal Component Analysis (PCA) featuring extant Indo-European and non-Indo-European groups from West Eurasia, a couple of typical early Neolithic farmers from Central Europe, a typical Western Hunter-Gatherer, also from Central Europe, and the Iceman from the Copper Age Tyrolean Alps, again typical of his time and place.*

It's just a taste of the ancient genomic data we have available from prehistoric Europe, but it has almost everything that is pertinent to the issue at hand.


You don't need to be familiar with PCA methodology to be able to read the plot. Basically, it shows that the present-day European population structure is the result of two main events:

- the arrival of early farmers from Anatolia during the Neolithic transition, which eventually led to the extinction of people like the Western Hunter-Gatherer, who is the most obvious outlier on the plot

- the expansion of Kurgan groups such as the Yamnaya, who may have been the ancestors or perhaps cousins of the Corded Ware people, from the western steppe during the Late Neolithic/Early Bronze Age, which shifted the genetic structure of almost all Europeans to the east, away from the Neolithic and Copper Age samples.

These were massive population turnovers, and, as a rule, massive population turnovers are accompanied by language change. So it's highly unlikely that any Europeans today are speaking languages derived from the languages of the Western Hunter-Gatherers or early Neolithic farmers from Central Europe. Moreover, consider this:

- most present-day Indo-European speaking Europeans form an elongated cluster between the Neolithic farmers and the Corded Ware sample, pointing to the steppe-derived Corded Ware Culture as the proximate agent of the Indo-European expansion in much of Europe

- the only present-day Europeans who closely resemble Neolithic farmers are some Sardinians (the small Romance cluster just above the two Neolithic samples), but Sardinians spoke Paleo-Sardinian or Nuragic languages until they adopted Indo-European speech, in the form of Latin, from the Romans.

Also, this isn't shown on the plot, but the dominant Y-chromosome haplogroup of early Neolithic farmers is G2a, which is a low frequency marker in Europe today. The two most common Y-chromosome haplogroups among present-day Europeans are R-M198 and R-M269, which are also typical of Corded Ware and Yamnaya males, respectively, and probably originally from the steppe.

All this begs the question: is there any way to rework the Anatolian hypothesis so that it can be salvaged? I doubt it. Even making the steppe a homeland for all of the main Indo-European language groups apart from Anatolian and Armenian doesn't appear to be a viable option.

It is true that the Yamnaya nomads carried Near Eastern-related ancestry which may represent Proto-Indo-European admixture from outside of the steppe. But there's no evidence that it came from Anatolia.

In fact, if Neolithic Anatolians were basically identical to early Neolithic European farmers, which seems to be the case (see here), then it's unlikely that it did, because the latter carried a peculiar genome-wide signal that is missing in Yamnaya samples (orange cluster in the ADMIXTURE bar graph below). Heck, even the early Corded Ware genomes from Germany barely show any of it.

I won't go into the linguistics arguments here why the Anatolian hypothesis is implausible. But it might be worth checking out a new book on the topic by linguists Asya Pereltsvaig and Martin W. Lewis: The Indo-European Controversy: Facts and Fallacies in Historical Linguistics. I haven't read it yet, so I welcome the opinions here of those who have. I did, however, read a lot of the online articles on which the book is based. As far as I know most of them are still available here and here.


*Another version of the same PCA, with the samples labeled individually, is available here. The samples are listed here. All of the samples are from Haak et al. and Allentoft et al. The PCA was run using ~56K high confidence SNPs listed here.

The Corded Ware sample is a composite of Corded Ware sequences from present-day Germany, Scandinavia, Estonia and Poland. The Yamnaya sample is a composite of Yamnaya sequences from the Samara and Rostov regions of present-day Russia.

I chose to use these composites instead of individual sequences because I didn't want to include any samples in the analysis with genotype rates of less than 98%.

Sunday, July 26, 2015

Global PCA of selected Late Neolithic/Bronze Age Eurasians


I was curious what the Bronze Age steppe and Corded Ware genomes from the Rise dataset would look like on Principal Component Analysis (PCA) plots alongside populations from across the globe. Ten genomes had enough high confidence (transversion) markers to be analyzed accurately in such a way. I also ran an Iron Age Swedish sample, just to see how it differed from the older genomes.

Click on the links to go to my drive to download the plots. If you're having trouble finding the ancient samples, type their IDs into the PDF search field and hit enter.

RISE509_Afanasievo
RISE509_Afanasievo
RISE509_Afanasievo

RISE511_Afanasievo
RISE511_Afanasievo
RISE511_Afanasievo

RISE500_Andronovo
RISE500_Andronovo
RISE500_Andronovo

RISE505_Andronovo
RISE505_Andronovo
RISE505_Andronovo

RISE00_Corded_Ware
RISE00_Corded_Ware
RISE00_Corded_Ware

RISE94_Corded_Ware
RISE94_Corded_Ware
RISE94_Corded_Ware

RISE493_Karasuk
RISE493_Karasuk
RISE493_Karasuk

RISE496_Karasuk
RISE496_Karasuk
RISE496_Karasuk

RISE548_Yamnaya
RISE548_Yamnaya
RISE548_Yamnaya

RISE552_Yamnaya
RISE552_Yamnaya
RISE552_Yamnaya

RISE174_Iron_Age_Scandinavia
RISE174_Iron_Age_Scandinavia
RISE174_Iron_Age_Scandinavia

I can't see any major surprises. But I do find it remarkable how very European the Andronovo individuals appear on these plots. Keep in mind that they're ~3,000-year-old samples from the Altai region of Russia. Their ancestors probably emigrated there from the Trans-Urals steppe sometime during the Middle Bronze Age.

The Andronovo Culture was succeeded in the Altai region during the Late Bronze Age by the Karasuk Culture, which was probably a new composite of local and perhaps foreign groups. Interestingly, the Karasuk samples featured above are obviously of mixed European/East Asian origin.

Note also that the Afanasievo and Yammnaya individuals fall outside the range of present-day European variation in many of the dimensions, basically as if they were pulling towards the Karitiana Indians of the Amazon. No doubt, this is their excess ANE talking.

By the way, I recently ran some of the same samples in PCA limited to West Eurasian populations. You can see the results here.

Wednesday, July 22, 2015

High-res R1b tree featuring 16 ancient sequences


Here's a useful R1b phylogenetic tree that was posted recently at the R1b-M269 (P312- U106-) DNA Project site.


If these results are correct (and judging by the quality of work at the aforementioned R1b project, I'm pretty sure they are), it would appear that the Samara hunter-gatherer, marked I0124, was not directly ancestral or even all that closely related to any of the Yamnaya/Pit-Grave samples from the North Caspian region (each one also marked with an I~ ID).

On the other hand, the North Caspian Yamnaya sequences are very similar to the rest of the Yamnaya sequences, which come from just north of the Caucasus (marked RISE~). Indeed, all of these Yamnaya samples are almost identical in terms of genome-wide genetic structure (see here).

What this suggests is that the Yamnaya nomads emigrated to the North Caspian from somewhere near the Caucasus, or they were the descendents of such migrants. And if we assume that their ancestral homeland abutted the territory of the Maikop Culture, as shown on this map from Dolukhanov 2014 (look for 9 - early Pit-graves), it becomes easy to understand why they carried such significant maternal and genome-wide genetic Caucasus-related admixture (usually estimated at around 50%).

However, if you're one of those online Near Eastern patriots who like to imagine the Yamnaya as your own, please don't jump for joy just yet. The Yamnaya nomads still look very much like a people native to the western steppe, and this is probably also where their R1b comes from.

Sunday, July 19, 2015

The real thing


A couple of years ago Moorjani et al. concluded that present-day Georgians of the Transcaucasus were the best available proxy for the ancient West Eurasian population that mixed into the South Asian gene pool.

This was a solid statistical fit. And you can see on the TreeMix graph below, featuring a Georgian and a Kalash, why it worked so well.




But it was also a big fat coincidence, because check out what happens when I add another migration edge to the same graph.




Thus, the Indo-Iranian and hence Indo-European speaking Kalash no longer looks very similar to the Kartvelian speaking Georgian. In fact, he appears to be most closely related to the supposedly Indo-European speaking Afanasievo and Yamnaya nomads of the Early Bronze Age Eurasian steppe. The rest of his ancestry is probably best described as South Central Asian, which is an unknown quantity to me at this stage, but probably in large part of indigenous South Asian origin (see here).

I'm only able to show this thanks to the ancient samples that are on the tree, for which, as far as I know, there aren't any useful substitutes among present-day populations. Obviously, Moorjani et al. didn't have this luxury, so they ended up with a model that was statistically sound, but didn't make much sense otherwise, especially in terms of linguistics.

My TreeMix model is easily reproducible with most of the other South Asian samples from the Human Origins, and it gels nicely with uniparental marker data too. For instance, here's a close up from a similar graph featuring a Pathan, with a few extra details.




Yep, not only do Pathans cluster among these ancients of the Eurasian steppe, but most of them also carry the same Y-chromosome haplogroup: R1a-Z93, which is derived from R1a-M417, and in all likelihood first expanded in a big way with the Proto-Indo-Iranians of the Trans-Ural steppe.

By the way, the Human Origins has four different sets of Gujarati samples from Houston, USA, marked A, B, C and D, and each one shows a different level of ancient steppe admixture as inferred with my test. GujaratiA score around 50% while GujaratiD only 40%. Does anyone know why these Gujaratis were grouped in such a way? Was it based on genetic structure or caste origin?





Full output from the analysis above is available in a zip file here. The reference samples and markers are listed here and here. The ancient samples are from Allentoft et al. 2015 and Haak et al. 2015.

See also...

Population genomics of Early Bronze Age Europe in three simple graphs

Friday, July 17, 2015

Iron Age and Anglo-Saxon genomes from eastern England (Schiffels et al. preprint)


I haven't read this properly yet, but the results appear to be very similar to those I obtained with some of the same ancient genomes (see here), which must be very heartening for the authors (j/k). By the way, it's interesting to note that the word Celtic doesn't appear anywhere in the paper. I wonder why?

British population history has been shaped by a series of immigrations and internal movements, including the early Anglo-Saxon migrations following the breakdown of the Roman administration after 410CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences generated from ten ancient individuals found in archaeological excavations close to Cambridge in the East of England, ranging from 2,300 until 1,200 years before present (Iron Age to Anglo-Saxon period). We use present-day genetic data to characterize the relationship of these ancient individuals to contemporary British and other European populations. By analyzing the distribution of shared rare variants across ancient and modern individuals, we find that today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and estimate that the contemporary East English population derives 30% of its ancestry from Anglo-Saxon migrations, with a lower fraction in Wales and Scotland. We gain further insight with a new method, rarecoal, which fits a demographic model to the distribution of shared rare variants across a large number of samples, enabling fine scale analysis of subtle genetic differences and yielding explicit estimates of population sizes and split times. Using rarecoal we find that the ancestors of the Anglo-Saxon samples are closest to modern Danish and Dutch populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain.

Schiffels et al., Iron Age and Anglo-Saxon genomes from East England reveal British migration history, bioRxiv, Posted July 17, 2015. doi: http://dx.doi.org/10.1101/022723