search this blog


Tuesday, May 5, 2015

The LN/EBA: like one big party

I was messing around with some D-stats today, mainly looking at the potential survival and rebound of EHG, SHG and WHG, and came up with a fascinating, in my opinion at least, set of results:

At the most basic level, we can interpret D-stats like this: if the Z-score is +ve, then the gene flow occurred between W and Y or X and Z; if it's -ve, then the gene flow occurred between W and Z or X and Y. However, we can also consider (Y,Z) a clade with respect to (W,X) if the D statistic doesn't deviate significantly from 0.

Also, if you're struggling with the acronyms that I'm using in this post, this list should help.

I think it's pretty obvious that what we're seeing in these results is the very rapid formation of the present-day northern European gene pool during the LN/EBA period. Note that the D statistics and Z scores for all of the present-day samples basically reach 0 with Bell Beaker_LN and/or Unetice_EBA.

In other words, practically the same process appears to have affected a massive stretch of the European continent, all the way from the Atlantic to what is now western Russia.

This correlates rather nicely with linguistics and archeological data. For instance, most of the groups in the analysis are Indo-Europeans, except for the Estonians and Hungarians, whose Uralic ancestors mixed heavily with surrounding Indo-European populations. The same D statistics with other Uralic and also southern European groups don't follow the same patterns.

However, to be honest, I'm not really sure what all of these stats say about the original focus of my experiment; the potential survival and rebound of EHG, SHG and WHG.

Paradoxically, it seems to me as if WHG-related ancestry survived better and/or rebounded more successfully in northeastern Europe, rather than northwestern Europe, where populations today show relatively higher affinity to Neolithic farmers, particularly Germany_MN. Any ideas?

By the way, I chose Karitiana Indians and Iranian Jews to represent (W,X) because, between them, they harbor a lot of MA1-related and Near Eastern ancestry, but no WHG or any other type of European ancestry. Thus, I was hoping that they'd be helpful in revealing fine-scale patterns of WHG-related affinity across space and time in Europe.

See also...

R1a1a from an Early Bronze Age warrior grave in Poland

Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint)

Saturday, May 2, 2015

Complex admixture history and recent southern origins of Siberians

On a brighter note, here are some interesting Identity-by-Descent (IBD) maps from a very competent new preprint at bioRxiv on the admixture history of Siberians. You'll find the full collection in the paper's supp info PDF here.

The results suggest that Russians share minimal levels of IBD with Siberians, despite the well documented presence of significant European admixture in many parts of Siberia. But this isn't really a contradiction because:

Furthermore, our results demonstrate that the European ancestry component detectable in many populations of southern, central, and northern Siberia is not the result of post-colonial Russian admixture as may have been expected [37] and as was suggested on the basis of ALDER analyses [38]. Rather, with the exception of the Dolgans and the Samoyedic-speaking groups of western Siberia, the European ancestry represents one of the most ancient components [dating to not more than 4500 years ago] of the complex admixture history of Siberian populations.

The ~4,500 YBP date makes a lot of sense. This is the Late Neolithic/Early Bronze Age period, which also saw massive population movements from the steppe deep into Europe (see here).


Pugach et al., The complex admixture history and recent southern origins of Siberian populations, bioRxiv, Posted April 30, 2015, doi:

Thursday, April 30, 2015

The enigma of the Kalash

Last year Garrett Hellenthal et al. claimed that the Kalash people of the Hindu Kush received a large pulse of admixture from somewhere in the west, possibly Europe, as late as 327–326 BCE. They even suggested that Alexander's soldiers may have been the culprits. But this was naive and wrong.

Now, Qasim Ayub et al. are claiming that the Kalash are an Ancient North Eurasian (ANE) population that has remained genetically isolated for the past 11,800 years. This is also naive and wrong.

One day, perhaps in the not too distant future, someone will study the population history of the Hindu Kush using ancient DNA and methods that actually work. What I think they will find is that the Kalash, just like most of their neighbors, are largely the result of an admixture event during the Bronze Age between Indo-Iranian migrants from the steppe and Central Asian agriculturists. They will confirm that the Kalash are an extreme isolate, but only since the Bronze Age, not the early Neolithic.

These results will correlate very nicely with mainstream linguistics and archeology, latest expansion dates for uniparental markers, and even common sense.


Ayub et al., The Kalash Genetic Isolate: Ancient Divergence, Drift, and Selection, The American Journal of Human Genetics (2015),

See also...

The teal people: did they actually exist, and if so, who were they?

Monday, April 27, 2015

The Steppe K10

For those of you who aren't yet fed up with reading about the Yamnaya nomads and the ancient Eurasian steppe, here are the results from my latest ADMIXTURE experiment:

Steppe K10 spreadsheet

The Fst distances between the ten components are interesting. The steppe component, which peaks at basically 100% among the Yamnaya genomes from Haak et al. 2015, is closest to the Hindu Kush component (0.058 Fst). Any idea what that might mean?

Steppe K10 Fst

I won't be offering to run this test for anyone, not at this stage anyway, but it's an easy analysis to reproduce. Just leave out all Indians from your dataset, and pack it with the ancient European LN/EBA samples from Haak et al. 2015 and South Central Asians from the HGDP. At high enough K, this should force the algorithm to find the ancient steppe and Hindu Kush components. Most of the Ancestral South Indian (ASI) type stuff will be siphoned off to the four East Eurasian clusters.

See also...

Yamnaya-related ancestry proportions in Europe and west Asia

The teal people: did they actually exist, and if so, who were they?

Tuesday, April 21, 2015

West Eurasian mtDNA lineages in India

It might be interesting to compare the modern-day Indian mtDNA from this paper to the the ancient European mtDNA from Haak et al. 2015. The data is freely available in the spreadsheet here. Anybody up for it?

There is no indication from the previous mtDNA studies that west Eurasian-specific subclades have evolved within India and played a role in the spread of languages and the origins of the caste system. To address these issues, we have screened 14,198 individuals (4208 from this study) and analyzed 112 mitogenomes (41 new sequences) to trace west Eurasian maternal ancestry. This has led to the identification of two autochthonous subhaplogroups-HV14a1 and U1a1a4, which are likely to have originated in the Dravidian-speaking populations approximately 10.5-17.9 thousand years ago (kya). The carriers of these maternal lineages might have settled in South India during the time of the spread of the Dravidian language. In addition to this, we have identified several subsets of autochthonous U7 lineages, including U7a1, U7a2b, U7a3, U7a6, U7a7, and U7c, which seem to have originated particularly in the higher-ranked caste populations in relatively recent times (2.6-8.0 kya with an average of 5.7 kya). These lineages have provided crucial clues to the differentiation of the caste system that has occurred during the recent past and possibly, this might have been influenced by the Indo-Aryan migration. The remaining west Eurasian lineages observed in the higher-ranked caste groups, like the Brahmins, were found to cluster with populations who possibly arrived from west Asia during more recent times.

Palanichamy et al., West Eurasian mtDNA lineages in India: an insight into the spread of the Dravidian language and the origins of the caste system, Human Genetics, 2015 Apr 2. [Epub ahead of print]

See also...

Indian genetic structure in the context of ancient European DNA

Friday, April 17, 2015

IBS similarity analysis: 60 ancient genomes + 233 present-day pops

IBS stands for Identical-by-State. The full output is available in a zip file here. Below are a few examples in chronological order. Most of these genomes are from Haak et al. 2015.

MA1 or Mal'ta boy


La Brana-1

Motala_HG I0117

Samara_HG I0124

HungaryGamba_HG KO1

Stuttgart LBK380

HungaryGamba_EN NE1

Spain_EN I0412

Spain_MN I0406

Esperstedt_MN I0172

Oetzi the Iceman

Yamnaya I0429

Corded_Ware_LN I0104

Bell_Beaker_LN I0113

Unetice_EBA I0047

HungaryGamba_BA BR2

Hinxton4 ERS389798

Hinxton2 ERS389796

The results obviously make a lot of sense. Also, please note that my Principal Component Analyses (PCA) are usually based on IBS similarity, so it's a method that I have a lot of confidence in. Here are some examples from a few weeks ago featuring samples from the IBS zip file.

Karelia_HG I0061 PCA

Yamnaya I0231 PCA

Yamnaya I0443 PCA

Corded_Ware I0103 PCA

Bell_Beaker I0112 PCA

Update 18/04/2015: Matt posted this PCA based on the IBS similarity stats in the comments section. Sardinians and Samaritans appear to be the two obvious outliers within West Eurasia, which is probably because they harbor significantly lower levels of admixture from the steppe and/or Central Asia than their neighbors.


Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317

See also...

Hinxton ancient genomes roundup

Friday, April 3, 2015

The teal people: did they actually exist, and if so, who were they?

The ADMIXTURE analysis in Haak et al. 2015 includes a series of intriguing teal colored components from K=16 to K=20 (see image here). The main reason I'm so intrigued by these components is because they generally make up over 40% of the genetic structure of the potentially Proto-Indo-European Yamnaya genomes.

But there's only so much one can learn by starring at a bar graph, so I thought I'd have a go at isolating the same signal with ADMIXTURE to study it in more detail. You can view the results of my experiment in the spreadsheet here.

I wasn't able to completely nail any one of the teal components from Haak et al., because I don't have access to all of the samples used in the paper (I'd have to sign a waiver to get them). Nevertheless, the signal looks basically the same.

Below is a bar graph based on the output featuring selected populations and ancient genomes from Europe and Asia. The Fst genetic distances between the nine components are available here.

Note that the teal component peaks in the Caucasus and the Hindu Kush, and generally shows a strong correlation with regions of relatively high MA1-related or Ancient North Eurasian (ANE) admixture. On the other hand, the orange component peaks among Early European Farmers (EEF), who basically lack ANE.

To learn about the structure of the three main West Eurasian components - blue, orange and teal - I made synthetic individuals from the P output to represent each of the components, and tested them with my K8 model. As expected, the teal component harbors a high level of ANE, while the orange component lacks it altogether. Refer to the spreadsheet here.

It's very likely that the teal and orange components from Haak et al. share these traits. I think this is more than obvious by looking at their frequencies across space and time in Eurasia.

I also analyzed the synthetic individuals with PCA based on their K8 ancestry proportions. The samples representing the orange component fall just south of the Stuttgart genome from Neolithic Germany, and this is basically where I expect Neolithic genomes from the Near East to cluster when they become available.

Interestingly, the samples representing the blue component are dead ringers for Scandinavian hunter-gatherers (SHG). However, I suspect this is something of a coincidence caused by the small number of Western European hunter-gatherer (WHG) and Eastern hunter-gatherer (EHG) genomes in the dataset. The algorithm probably doesn't have enough variation to latch onto to create both WHG and EHG components, and in the end settles for something in between, which just happens to resemble SHG.

But the fact that the orange and blue samples more or less pass for ancient populations leaves open the possibility that the same might be said for the teal samples.

So did the teal people actually exist, and if so, who were they?

My view at the moment is that a population very similar to the teal samples formed in Central Asia or the North Caucasus during the Neolithic as result of admixture between MA1-like and Near Eastern groups. This population, I believe, then expanded into the Russo-Kazakh steppe by the onset of the Eneolithic.

Were they perhaps the Proto-Indo-Europeans? Probably not. I'd say they were Neolithic farmers who eventually played a role in the formation of the Proto-Indo-Europeans. In any case, someone had to bring the Caucasian or Central Asian admixture to the steppe, and I have it on good authority that it was already present among the Khvalynsk population of the Eneolithic, albeit at a lower level than among the Yamnaya of the early Bronze Age.


Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317

See also...

Modeling the Yamnaya with qpAdm