search this blog

Sunday, February 19, 2017

Phylogeography of Y-haplogroup Q3-L275

BMC Evolutionary Biology has a decent new paper on the phylogeography of Y-haplogroup Q3-L275. It would've been a great paper a couple of years ago, but I think that nowadays papers like this should also come with a few kick ass ancient samples to help make their point, otherwise they just feel like a prelude to something else. In this case it's probably a matter of funding and logistics, because the authors appear to be aware of the pitfalls of working with modern-day data:

Haplogroup Q3-L275 results from the first known split within haplogroup Q, which occurred in the Paleolithic epoch: according to previous studies [15, 24], haplogroup Q split into the Q3-L275 and Q1’2-L472 branches around 35 ky ago. Thus the location of this split might help identify the homeland of haplogroup Q, from where it spread throughout Eurasia and the Americas. Our findings better support a West Asian or Central Asian homeland of Q3 than any other area: a higher frequency was found in West Asia and in neighboring Pakistan; and early branches were identified in West Asia, Central Asia and South Asia. Increasing the dataset of ancient DNA might in future identify additional early branches, helping to locate a possible homeland more precisely. The very few samples from present-day (Additional file 3: Table S2) or ancient [43] China do not contradict this hypothesis, as they came from the western provinces located in Central Asia or historically linked to this area. The single Portuguese sample likely reflects the origin of the carrier, rather than more general population history. Thus, Q3 was one of the Paleolithic West Eurasian haplogroups. Its West/Central Asian homeland proposed here is hypothetical, because present-day genetic patterns do not necessarily reflect ancient ones as these can be modified by the more recent demographic events.

I like this diagram. But again, it would've been even better if augmented by a sprinkling of high resolution ancient samples.

Balanovsky et al., Phylogeography of human Y-chromosome haplogroup Q3-L275 from an academic/citizen science collaboration, BMC Evolutionary Biology, 201717(Suppl 1):18, DOI: 10.1186/s12862-016-0870-2

Thursday, February 16, 2017

The Khvalynsk men #2

I didn't run any mixture models of the Khvalynsk men in my original post on these three individuals from the 5200-4000 BCE Eneolithic cemetery at Khvalynsk, Samara Oblast, Russia. That's because at the time I felt that I didn't have the right reference samples and outgroups to produce convincing results. But this is no longer an issue, so here goes, using qpAdm.


Anatolia_Chalcolithic 0.070±0.059
Caucasus_HG 0.136±0.050
Eastern_HG 0.794±0.037
chisq 7.964 tail_prob 0.716545

Anatolia_Chalcolithic 0.046±0.065
Caucasus_HG 0.155±0.058
Eastern_HG 0.799±0.038
chisq 7.970 tail_prob 0.715965

Anatolia_Chalcolithic 0.195±0.200
Caucasus_HG 0.238±0.192
Eastern_HG 0.567±0.076
chisq 10.965 tail_prob 0.446237


Anatolia_Chalcolithic 0.082±0.048
Caucasus_HG 0.135±0.042
Eastern_HG 0.783±0.030
chisq 5.610 tail_prob 0.898074

Caucasus_HG 0.065±0.047
Eastern_HG 0.804±0.025
Iran_Chalcolithic 0.132±0.051
chisq 6.909 tail_prob 0.806405

Caucasus_HG 0.147±0.033
Eastern_HG 0.797±0.027
Lengyel_LN 0.057±0.036
chisq 7.040 tail_prob 0.795835

Caucasus_HG 0.105±0.055
Eastern_HG 0.809±0.028
Iran_Late_Neolithic 0.086±0.051
chisq 8.304 tail_prob 0.685822

Armenia_Chalcolithic 0.130±0.056
Caucasus_HG 0.088±0.043
Eastern_HG 0.782±0.030
chisq 9.121 tail_prob 0.610719


Yamnaya_Samara:I0429 (3339-2917 calBCE)
Anatolia_Chalcolithic 0.190±0.063
Caucasus_HG 0.277±0.056
Eastern_HG 0.533±0.034
chisq 11.732 tail_prob 0.38412

I tried a number of different combinations of reference samples, and the three I settled for produced the best fits and lowest standard errors overall. That doesn't mean they literally show what happened; they're just the best we've got for the time being.

The results are very interesting, and perhaps unexpected, with Samara Eneolithic I0434 packing the highest ratio of Anatolia- and Caucasus-related ancestry, and, as per above, almost looking like he could be an early Yamnaya sample. I say perhaps unexpected because this individual belongs to Y-haplogroup Q1a and mitochondrial haplogroup U4a2, so his uniparental markers don't suggest any strong southern affinities.

But the result, even though only based on 13527 SNPs, looks robust enough, and it basically matches the Principal Component Analysis (PCA) that I featured in my original post.

Keep in mind that 10434 is the individual that appears to have been whacked over the head a few times and simply thrown into a ditch. Perhaps this suggests that the genetic shift in the Samara region from the Eneolithic to the Bronze Age, which saw the dilution of Eastern Hunter-Gatherer (EHG) ancestry by Anatolian- and Caucasus-related gene flows, was not always a peaceful and migrant-friendly process.

Wednesday, February 15, 2017

Post-ANE Siberian admixture in Middle Neolithic East Baltic foragers (?)

This hasn't been reported anywhere before, but it appears that at least one of the Latvian Middle Neolithic (MN) samples from Jones et al. 2017 harbors elevated post-Ancient North Eurasian (ANE) Siberian admixture.

If true, and it needs to be confirmed with more markers, then this individual, dated to ~6,000 cal BP, is the oldest European with this type of ancestry sequenced to date. Consider the following qpAdm models based on ~22K SNPs with Nganasans as the Siberian reference population:


Eastern_HG 0.788±0.096
Western_HG 0.135±0.078
Nganasan 0.076±0.038
chisq 10.493 tail_prob 0.486685

Eastern_HG 0.735±0.090
Western_HG 0.190±0.072
Nganasan 0.075±0.035
chisq 11.189 tail_prob 0.427555

I couldn't test Latvia_MN1 separately due to a lack of markers. However, using exactly the same setup on the older samples from Jones et al. 2017, the Nganasan-related signal fails to show for Latvia_HG and only registers at 0.5% for Ukraine_HG/N. But that 0.5% looks somewhat shaky considering the ten times higher standard error. The other coefficients make good sense.

Eastern_HG 0.314±0.042
Western_HG 0.686±0.042
Nganasan 0
chisq 10.035 tail_prob 0.612908

Eastern_HG 0.676±0.153
Western_HG 0.319±0.129
Nganasan 0.005±0.053
chisq 11.114 tail_prob 0.433755

So, you're probably asking, does Latvia_MN-related ancestry explain the elevated Nganasan-related ancestry in modern-day far Northeastern Europeans such as Finns? Perhaps some of it, but not all of it. Note the slight drop in the Nganasan-related ancestry for the Finns with the inclusion of Latvia_MN in the model.

Lengyel_LN 0.305±0.020
Western_HG 0.135±0.014
Yamnaya_Samara 0.457±0.025
Nganasan 0.104±0.008
chisq 12.401 tail_prob 0.25911

Latvia_MN 0.137±0.113
Lengyel_LN 0.316±0.070
Western_HG 0.119±0.051
Yamnaya_Samara 0.354±0.123
Nganasan 0.074±0.020
chisq 1.429 tail_prob 0.99764

My verdict: the minor Nganasan-related signal in Latvia_MN, or at least Latvia_MN2, is probably real, and the extra Nganasan-related admixture in modern-day Finns possibly arrived in Northeastern Europe in several waves from the Middle Neolithic onwards, including with early speakers of Uralic languages during the Bronze or Iron Age.

Monday, February 13, 2017

Mitogenome diversity in Sardinians

Good stuff at Mol Biol Evol:

Sardinians are “outliers” in the European genetic landscape and, according to paleogenomic nuclear data, the closest to early European Neolithic farmers. To learn more about their genetic ancestry, we analyzed 3,491 modern and 21 ancient mitogenomes from Sardinia. We observed that 78.4% of modern mitogenomes cluster into 89 haplogroups that most likely arose in situ. For each Sardinian-Specific Haplogroup (SSH), we also identified the upstream node in the phylogeny, from which non-Sardinian mitogenomes radiate. This provided minimum and maximum time estimates for the presence of each SSH on the island. In agreement with demographic evidence, almost all SSHs coalesce in the post-Nuragic, Nuragic and Neolithic-Copper Age periods. For some rare SSHs, however, we could not dismiss the possibility that they might have been on the island prior to the Neolithic, a scenario that would be in agreement with archeological evidence of a Mesolithic occupation of Sardinia.

Olivieri et al., Mitogenome Diversity in Sardinians: a Genetic Window onto an Island's Past, Mol Biol Evol, Published: 08 February 2017, DOI:

Dr Patterson I presume

As many of you probably know, Harvard's Nick Patterson has been entrusted with the job of pinpointing the Proto-Indo-European homeland with ancient DNA. The Radcliffe Magazine has a feature on the topic titled The Man Who Breaks Codes. Here's an interesting quote from the feature:

At Radcliffe, Patterson is investigating ways in which DNA reveals how populations (and languages) spread throughout Eurasia. Speakers of Indo-European languages were living 2,500 years ago in western China, on the Russian steppes, on the Atlantic coast of Europe, and in India. He asks, How did this linguistic and genetic spreading out happen? Patterson has no plans for a book, but a series of linked scholarly articles is under way. Three are in various stages of completion, including one on the origin of the Celts in what is now Great Britain.

I'm guessing the author is talking about the Bell Beaker behemoth in that last sentence. Apparently it was supposed to be out late last year, but rumor has it that it keeps getting delayed for one reason or another. I have no idea what is really going on there, but quite frankly, I'd say we've all waited long enough for the release of a new ancient DNA dataset. So yeah, soon please.

American Midwest: home away from home

Potentially interesting factoid: the American Midwest harbors populations with some of the highest levels of European hunter-gatherer and Early Bronze Age steppe ancestry in the world today, because it was mainly settled by migrants from East Central Europe, Finland, Northern Germany and Scandinavia. Was this by coincidence or design (ie. their preference for the Midwest climate?). I have no idea, kind of cool though. Click for larger view...


Han, E. et al. Clustering of 770,000 genomes reveals post-colonial population structure of North america. Nat. Commun. 8, 14238 doi: 10.1038/ncomms14238 (2017).

Saturday, February 11, 2017

Yamnaya-related admixture in Bronze Age northern Iberia

The question of when ancient steppe or Yamnaya-related ancestry first entered Iberia is crucial to the Proto-Indo-European (PIE) homeland debate.

If the steppe or Kurgan PIE hypothesis is correct, then we'd expect this to have happened during the Bronze Age rather than, say, the Medieval Period with the migrations into Iberia of Northern Europeans likely rich in Yamnaya-related admixture like the Visigoths. That's because Indo-European languages are attested in Iberia as early as the Iron Age.

And indeed, the earliest Iberian sample in my dataset to show Yamnaya-related ancestry is Iberia_BA ATP9 from Gunther et al. 2015, dated to 3,700–3,568 C14 cal yBP or the Middle Bronze Age. This has not been reported before, but I'm certain that my finding will be confirmed sooner or later in scientific literature.

Many of you might remember that I've already looked at this issue back in 2015 (see here). However, that analysis was based on a very limited sequence of ATP9. So I'm going to do it all over again with a higher quality sequence, and eventually delete the old post.

Let's start with a basic Principal Component Analysis (PCA) featuring ATP9 alongside a wide range of modern-day and ancient samples from West Eurasia and South Central Asia.

Clearly, ATP9 is shifted east, closer to Yamnaya, relative to the earlier Iberia Chalcolithic (Iberia_ChL) group, and almost clusters with Basques, who are known to harbor significant Yamnaya-related ancestry (see here). I can use formal statistics as well as models based on formal statistics to investigate this in more detail.

Mbuti Yamnaya_Samara Iberia_ChL Iberia_BA D 0.0031 Z 0.859
Mbuti Yamnaya_Samara Iberia_ChL Basque_French D 0.0086 Z 5.035
Mbuti Yamnaya_Samara Basque_French Iberia_BA D -0.0044 Z -1.316

Surprisingly, based on those D-stats ATP9 doesn't appear to share more drift with Yamnaya Samara relative to Iberia_ChL (Z<3). But I suspect this might be due to inflated hunter-gatherer ancestry in Iberia_ChL, so let's try something a little different.

Western_HG Yamnaya_Samara Iberia_ChL Iberia_BA D 0.0188 Z 4.987
Western_HG Yamnaya_Samara Iberia_ChL Basque_French D 0.024 Z 13.163
Western_HG Yamnaya_Samara Basque_French Iberia_BA D -0.0063 Z -1.768

OK, that's basically in line with the PCA above, and I can cement this finding with the qpAdm algorithm. Note the nice chunk of Early Bronze Age steppe (Steppe_EBA) ancestry in ATP9.


Iberia_BA ATP9
Caucasus_HG 0.038±0.063
Lengyel_LN 0.683±0.066
Steppe_EBA 0.177±0.087
Western_HG 0.102±0.044
chisq 5.216 tail_prob 0.876272

Caucasus_HG 0.014±0.028
Lengyel_LN 0.607±0.028
Nganasan 0.011±0.016
Onge 0.013±0.022
Steppe_EBA 0.273±0.043
Western_HG 0.059±0.020
Yoruba 0.021±0.006
chisq 1.605 tail_prob 0.978452

Lengyel_LN 0.590±0.027
Nganasan 0.009±0.016
Onge 0.015±0.022
Steppe_EBA 0.285±0.031
Western_HG 0.096±0.019
Yoruba 0.006±0.006
chisq 3.485 tail_prob 0.900346

Of course, Basques are not Indo-Europeans, so the fact that ATP9 has some Yamnaya-related ancestry doesn't necessarily mean she was an Indo-European. However, it's not unreasonable to assume that the ancestors of Basques incurred gene flow from early Indo-Europeans moving into the Iberian Peninsula, and this probably explains their relatively high level of Yamnaya-related ancestry. So ATP9 may well have spoken an Indo-European language, and if not, then like Basques she probably has Indo-European ancestry

Friday, February 10, 2017

Lots of ancient mtDNA from Iberia

A new preprint on the maternal genetic history of the Iberian Peninsula has just appeared at bioRxiv. In all likelihood, it's a precursor to another paper focusing on genome-wide data from most of the same samples. Looks like we shouldn't expect any Yamnaya-related admixture in ancient Iberians until after the Early Bronze Age, unless it's all male mediated, which is possible but unlikely.

Abstract: Agriculture first reached the Iberian Peninsula around 5700 BCE. However, little is known about the genetic structure and changes of prehistoric populations in different geographic areas of Iberia. In our study, we focused on the maternal genetic make-up of the Neolithic (~ 5500-3000 BCE), Chalcolithic (~ 3000-2200 BCE) and Early Bronze Age (~ 2200-1500 BCE). We report ancient mitochondrial DNA results of 213 individuals (151 HVS-I sequences) from the northeast, middle Ebro Valley, central, southeast and southwest regions and thus on the largest archaeogenetic dataset from the Peninsula to date. Similar to other parts of Europe, we observe a discontinuity between hunter-gatherers and the first farmers of the Neolithic, however the genetic contribution of hunter-gatherers is generally higher and varies regionally, being most pronounced in the inland middle Ebro Valley and in southwest Iberia. During the subsequent periods, we detect regional continuity of Early Neolithic lineages across Iberia, parallel to an increase of hunter-gatherer genetic ancestry. In contrast to ancient DNA findings from Central Europe, we do not observe a major turnover in the mtDNA record of the Iberian Late Chalcolithic and Early Bronze Age, suggesting that the population history of the Iberian Peninsula is distinct in character.

Anna Szecsenyi-Nagy et al., The maternal genetic make-up of the Iberian Peninsula between the Neolithic and the Early Bronze Age, bioRxiv, Posted February 10, 2017, doi:

Thursday, February 9, 2017


I found a really good archaeological paper on the agricultural transition in what is now eastern Ukraine. It helps to explain not only the origins of agriculture on the Western Steppe, but probably also the ancestry of Khvalynsk, Yamnaya and other closely related steppe pastoralist groups, as a three-way mixture between North Eurasian foragers and early Balkan and Caucasus farmers. This fits very nicely with my qpAdm models showing significant Late Neolithic Lengyel-related input in Yamnaya (see here).

Abstract: This paper presents the results of the first archaeobotanical investigation of NeolithicChalcolitich-period sites in eastern Ukraine and southwest Russia. The goal of this research is to understand the timeline of the earliest appearance and possible geographical origins of domesticated plants species in the region of study. The research conducted consists of the retrieval and study of macrobotanical remains and the analysis of plant impressions in pottery. Three possible corridors of influence upon agriculture in eastern Ukraine are postulated in this paper, originating from the Balkans, the Caucasus, and the Eurasian steppe.

At the same time, in contrast to what many still claim in the comments here and elsewhere, it's extremely unlikely now that Y-chromosome haplogroups R1a and R1b were introduced onto the steppe by these farmers (see here and here). Clearly, they appear to be paternal markers native to Eastern Europe, in so far as they've been present in the region since at least the Mesolithic.

It's rather improbable that we can say the same about the R1a and R1b in the Near East and South Asia, which of course means that we're edging closer and closer to solving the Indo-European Urheimat question, because R1a-M417 and R1b-M269 are by far the best candidates for the main Y-haplogroups of the Proto-Indo-Europeans (see here).


Giedre Motuzaite-Matuzeviciute, The earliest appearance of domesticated plant species and their origins on the western fringes of the Eurasian Steppe, Documenta Praehistorica, Vol 39 (2012), DOI:

See also...

Steppe boys, farmer girls

Irish Travellers

Open access at Scientific Reports:

Abstract: The Irish Travellers are a population with a history of nomadism; consanguineous unions are common and they are socially isolated from the surrounding, ‘settled’ Irish people. Low-resolution genetic analysis suggests a common Irish origin between the settled and the Traveller populations. What is not known, however, is the extent of population structure within the Irish Travellers, the time of divergence from the general Irish population, or the extent of autozygosity. Using a sample of 50 Irish Travellers, 143 European Roma, 2232 settled Irish, 2039 British and 6255 European or world-wide individuals, we demonstrate evidence for population substructure within the Irish Traveller population, and estimate a time of divergence before the Great Famine of 1845–1852. We quantify the high levels of autozygosity, which are comparable to levels previously described in Orcadian 1st/2nd cousin offspring, and finally show the Irish Traveller population has no particular genetic links to the European Roma. The levels of autozygosity and distinct Irish origins have implications for disease mapping within Ireland, while the population structure and divergence inform on social history.

Gilbert, E. et al. Genomic insights into the population structure and history of the Irish Travellers. Sci. Rep. 7, 42187; doi: 10.1038/srep42187 (2017).