search this blog


Saturday, November 22, 2014

3-Pop analysis featuring 19 ancient reference genomes and over 80 present-day populations

Regular visitors will know what I've got here. Those of you who don't, please refer to page 82 of the Lazaridis et al. 2014 supp info. Also, you'll find brief descriptions of each of the ancient genomes from this analysis here.

3-Pop analysis spreadsheet

I had a few other groups in this run, including the BedouinB and Kalash, but their results didn't include any negative statistics so I left them out of the spreadsheet. The majority of the top (ie. most negative) results involve Stuttgart and MA-1, which is not unexpected. Here are some of the other top outcomes that caught my eye:

Basque_French; Stuttgart, Loschbour f3: -0.005569 Z-score: -3.146

Basque_Spanish; Stuttgart, Loschbour f3: -0.004972 Z-score: -2.673

Burusho; Saqqaq, BR2 f3: -0.001463 Z-score: -1.003

Finnish; MA-1, Gokhem2 f3: -0.009306 Z-score: -4.714

French_South; Stuttgart, La_Brana-1 f3: -0.008885 Z-score: -4.8

Lithuanian; MA-1, Gokhem2 f3: -0.007962 Z-score: -4.088

Sardinian; Stuttgart, La_Brana-1 f3: -0.00437 Z-score: -2.513

Spanish_Aragon; Stuttgart, Loschbour f3: -0.010528 Z-score -5.636

Spanish_Murcia Stuttgart, La_Brana-1 f3: -0.010499 Z-score: -5.409

Spanish_Pais_Vasco; Stuttgart, Loschbour f3: -0.010165 Z-score: -5.048

Turkish_Aydin; Saqqaq, NE1 f3: -0.013137 Z-score: -8.88

Turkmen; Stuttgart, Saqqaq f3: -0.013659 Z-score: -8.278

And a few observations based on the entire output:

- West Eurasians appear to have become less Mediterranean and Near Eastern after the Neolithic, or even the Copper Age.

- Populations closely related to MA-1 had a profound impact on the genetic structure of almost all West Eurasians and South and Central Asians.

- West Eurasians share basically the same ancestral components, although often in very different proportions (which unfortunately the f3-stats aren't able to show).

- The only groups in Europe that are better fitted as mixtures of Stuttgart and Loschbour or La Brana-1, as opposed to Stuttgart and MA-1, are found in southwestern Europe, which just happens to be the most distant part of Europe from the Eastern European/Central Asian steppe.

- Turkic and Uralic groups generally show stronger signals of admixture with the Saqqaq as one of the reference genomes than the Indo-Europeans.

- Only two European groups, Lithuanians and Finns, are better fitted as Gokhem2/MA-1 than Stuttgart/MA-1, probably because they harbor the lowest levels of Near Eastern ancestry in this analysis.

- None of the European populations has any of the post-Copper Age genomes near the top of their stats, which suggests that the present-day European genetic structure formed very rapidly during or just after the Copper Age.

Feel free to post your own inferences from the data in the comments below. I'll try this again when more genomes roll in. I'm expecting quite a few to be published next year, especially from Central and Eastern Europe. Fingers crossed.

By the way, on a related note, here's a new paper on isotope data from various steppe groups, including Yamnaya:

A Multi-Isotopic Approach to the Reconstruction of Prehistoric Mobility and Economic Patterns in the West Eurasian Steppes 3500 to 300 BC

Every bit of new information on this very exciting topic is welcome, and I really don't want to sound ungrateful, but after reading that the first thing that came to my mind was thank God for ancient DNA.

Tuesday, November 18, 2014

Review paper: Human paleogenetics of Europe - The known knowns and the known unknowns

Many people, including me, are impatiently waiting for the new manuscript from the Reich Lab on the genetic shifts in Central and Eastern Europe during the late Neolithic/early Bronze Age, which will apparently include genome-wide data from Bell Beaker, Corded Ware and Yamnaya remains (see here). Rumor has it that it'll appear at bioRxiv within a few weeks.

Meantime, it might be useful to check out this review paper by Guido Brandt et al. on the present state of play in European paleogenetics.

Human paleogenetics of Europe - The known knowns and the known unknowns

It's a thorough summary of almost all ancient DNA results to date from Europe, and includes some very nice maps and other figures that look like updates on the stuff from Brandt et al. 2013 (see here). However, there are a couple of major problems with this paper that drag it down a few notches in my estimation.

Firstly, the authors leave open the possibility that Indo-European languages were introduced into Europe by early Neolithic farmers from Anatolia. Maybe they're trying to be diplomatic and humor those that won't let this failed hypothesis finally die, because otherwise I have no idea why they even considered it?

There are some very good reasons now why this is indeed a failed hypothesis. For one, linguistic evidence shows that all Indo-European languages in Europe include similar loans of non-Indo-European origin associated with farming, like the words for bean, carrot, hemp, oats and pea (for instance, see here).

These words were in all likelihood borrowed by the early Indo-Europeans from someone else as they spread out across Europe well after agriculture had been established throughout much of the continent. So who was this someone else? Probably the non-Indo-European descendants of the non-Indo-European early farmers from Anatolia.

Ancient DNA shows something similar. All ancient European genomes in a farming context sequenced to date from the Neolithic to the Copper Age are clearly distinct from present-day Indo-European speaking Europeans. But they resemble very closely present-day Sardinians, whose ancestors only became Indo-European speakers during the late Iron Age.

The other serious problem with this paper is the suggestion that present-day Northeast Europeans show the highest genome-wide affinity to Pitted Ware hunter-gatherers because the eastern Baltic acted as a refugium during the Last Glacial Maximum (LGM). It's on page 10 of the PDF.

This must be some sort of oversight, because I refuse to believe that the authors aren't aware of the fact that the eastern Baltic was covered in a big fuck off sheet of ice during the LGM. Here's a map from Mangerud et al 2004.

A much more plausible explanation why present-day Northeast Europeans show the highest genome-wide affinity to Pitted Ware hunter-gatherers, and indeed all European hunter-gatherers for whom we have data, is that their ancestors were amongst the last people in Europe to take up farming and Christianity.


Brandt, G., et al., Human paleogenetics of Europe - The known knowns and the known unknowns, Journal of Human Evolution (2014),

Thursday, November 13, 2014

TreeMix graphs with Kostenki14 and Ust'-Ishim

First of all, here's a map with some basic info about these two Upper Paleolithic North Eurasian genomes. They're separated by less than 10,000 years and a couple thousand kilometers, so in theory they shouldn't be all that different.

Let's see what TreeMix has to say on the matter. Note that the graphs also include five other ancient genomes: Denisova, Altai Neanderthal, Loschbour, Stuttgart and BR2 (LBA_Hungary).

Admittedly, I'm still learning to use TreeMix. But with that in mind, I'd say the graphs above appear very reasonable, and show outcomes that generally fit with what I've seen elsewhere.

For instance, Denisova harbors something chimp-like that isn't shared with the Altai Neanderthal. This might be a signal of the introgression from an unidentified archaic hominin that has already been reported in scientific literature.

In regards to Kostenki14, the graphs back one of the main conclusions of Seguin-Orlando et al. (ie. the people who first analyzed and published this genome), in that it appears basal to later Europeans. However, the last two graphs suggest that this basal ancestry is not the same thing as the Stuttgart-related Basal Eurasian component described in Lazaridis et al., which, if I understand correctly, is what Seguin-Orlando et al. were saying.

In fact, the basal stuff carried by Kostenki14 seems to be related to the greater part of Ust'-Ishim's genetic makeup. I say the greater part, because Ust'-Ishim also appears to harbor Papuan-like ancestry not shared with Kostenki14.

Is there anything I can do to make these graphs more informative? Perhaps add or take away some samples? Feel free to let me know in the comments below.

By the way, I downloaded the Kostenki14, LBA_Hungary and Ust'-Ishim genomes from Genetic Genealogy Tools. The rest of the samples came from the Reich Lab's Human Origins dataset, available here.

Update 14/11/2014: After looking over the results above and reading the comments below, I made a few changes to the dataset and came up with a couple more graphs that I think are worth sharing. I'm quite certain now that the so called Basal Eurasian ancestry carried by Stuttgart and Kostenki14 can't be lumped into a single component.

See also...

Kostenki14: first genome of an Upper Paleolithic European

Ust'-Ishim belongs to K-M526

Thursday, November 6, 2014

Kostenki14: first genome of an Upper Paleolithic European

At last, we have an ancient genome from pre-LGM Europe: Kostenki14 (K14) from the famous Kostenki Upper Paleolithic site in southern Russia. The paper, Seguin-Orlando et al. 2014, is locked away behind a paywall, but at least the supplementary materials are open access.

K14 is dated at 38,700-36,200 cal BP and belongs to Y-chromosome haplogroup C-M130, a basal and widespread paternal marker that has already been reported in three other ancient European genomes: La Brana-1 from Mesolithic Spain and NE5 and NE6 from Neolithic Hungary. It also belongs to mitochondrial (mtDNA) haplogroup U2, but we've actually known this since 2010 (see here).

The shared drift stats of the form f3(Mbuti;K14,X), where X is the test population, reveal that from among present-day Eurasians, this early European is most similar to Northeast Europeans, such as Lithuanians, Estonians and Belarusians, and some Western Europeans, like Basques and Orcadians (ie. people from the Orkney Isles). This is also what we've seen from other indigenous European hunter-gatherer genomes sequenced to date.

As far as Eurasians are concerned, Papuans and Melanesians are the most distinct from K14, somewhat paradoxically so, considering the ancient genome's Oceanian-like Y-haplogroup. The authors speculate that this might be because they carry ancestry from a very basal lineage that went its own way before the split between West Eurasians and East Asians. But I'm wondering whether this result can't simply be explained by the inflated Denisovan admixture among Oceanians (usually reported at around 5%)?

Indeed, there's no mention anywhere in the paper that K14 has Denisova ancestry. However, much like the recently published Ust'-Ishim genome, it shows significantly larger genomic tracts of Neanderthal origin than present-day Eurasians. The implication of this is obvious, and well covered elsewhere, so I won't go into it here.

Arguably the most controversial outcome of the study is that it shows K14 to be partly of Basal Eurasian origin. This is a highly divergent Eurasian clade first described in Lazaridis et al. (see here), and associated with Neolithic farmers. Seguin-Orlando et al. came to their conclusion via two sets of D-statistics and an ADMIXTURE run, which showed K14 to carry a component specific to the Middle East.

If true, then this finding debunks one of the main premises in Lazaridis et al., which is that Basal Eurasian admixture first arrived in Europe from the Middle East with Neolithic farmers. However, it doesn't debunk this paper's model of the formation of the modern European gene pool. Basically, for that to happen we'd need the Basal Eurasian component to show up in pre-Neolithic samples from Western and Central Europe.

Nevertheless, David Reich (one of the co-authors of Lazaridis et al.) seemed so taken aback by the news that he suggested K14 might be contaminated. Or at least, he was reported to have made this suggestion (scroll down to the last paragraph here)

This is interesting because Reich is currently working on a paper that includes ancient genomes from the Samara Valley, which isn't too far away from the Kostenki site (see here). Judging by his reaction to K14's purported Basal Eurasian admixture, we can probably assume that the pre-Neolithic genomes he's analyzed from Russia don't show any signals of this type of ancestry.

In any case, the model devised by Seguin-Orlando et al., set out in the figure below, is actually very similar to the one in Lazaridis et al., with NEOL basically standing in for EEF (Early European Farmer) and MHG for WHG and SHG (Western European Hunter-Gatherer and Scandinavian Hunter-Gatherer, respectively).

However, the suggestion that the Yenisei Siberians carry MHG rather than ANE doesn't look right to me. Why would Siberians carry European rather than Siberian hunter-gatherer ancestry? I suspect the problem is that MHG is a composite of WHG and ANE (because, as we know, SHG are partly ANE). Thus, if the Yenisei Siberians do carry both ANE and WHG, because they might indeed harbor some ancient European admixture, then perhaps this is simply being classified as MHG? If so, then I suppose it's not technically wrong, but it does look confusing.


Seguin-Orlando et al., Genomic structure in Europeans dating back at least 36,200 years, Published Online November 6 2014, Science, DOI: 10.1126/science.aaa0114

Friday, October 31, 2014

Genetic continuity and shifts across the metal ages in the Carpathian Basin: analysis of ancient Hungarian genomes CO1, BR1 and IR1

The recent Gamba at el. paper on the genetic prehistory of the Great Hungarian Plain was an excellent piece of paleogenomic detective work. However, I feel that the authors could have done a little better with characterizing the genetic origins of their samples.

For instance, the Principal Component Analysis (PCA) appears to suffer from subtle projection bias, which is a common problem in ancient DNA studies (see here). Also, the model-based analyses, like the ADMIXTURE run, leave me wanting a lot more.

However, all of the samples are freely available online, including in user friendly genotype format at Genetic Genealogy Tools. So I thought it might be useful to take a closer look at three of the genomes, spanning a 2,000-year period from the Copper Age to the Iron Age: CO1, BR1 and IR1.

The metal ages are a critical period of prehistory and early history in the making of modern Europe. It's a time of profound cultural changes, and as we now know, large-scale genetic shifts across the continent (see here). Indeed, the three aforementioned genomes clearly show that major genetic shifts took place on the Great Hungarian Plain from the Copper Age to the Iron Age. However, they also suggest strong genetic continuity in the region throughout this period.

CO1, the Copper Age genome from a Baden Culture burial, appears ridiculously Western European, and could easily pass for a present-day Sardinian in most analyses, even though it's most likely of Balkan and Near Eastern origin. It's very similar in that respect to another Copper Age sample, Oetzi the Iceman from the Tyrolean Alps.

One of the main reasons for this Sardinian-like genetic character is certainly its very low level of Ancient North Eurasian (ANE) admixture, probably less than five per cent. Almost everyone in West Eurasia has more these days, so they appear a lot more eastern.

Shared drift stats of the form f3(Mbuti;CO1,Test) - Eurogenes dataset

Shared drift stats of the form f3(Mbuti;CO1,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

BR1 represents the Early Bronze Age (EBA) Mako Culture. It looks roughly like a cross between CO1 and someone from northeastern Europe with an unusually high level of hunter-gatherer ancestry, and also a fair whack of ANE. Indeed, after running a variety of tests, I'd say that BR1 has around 12% of ANE (in other words, more than Basques but less than British, which fits with its position on the West Eurasian PCA).

So as far as I can see, the most parsimonious explanation for this result is a population movement into present-day Hungary from the northeast during the EBA, perhaps associated with the early Indo-Europeans and the not-so-pleasant effects of the 4.2 kiloyear event (see here).

Interestingly, the 4A Oracle suggests that BR1 might in large part be a mixture of CO1 and KO1, which is another sample from Gamba et al., assigned to the Koros Culture of early Neolithic Balkan farmers, but with typically hunter-gatherer genetic structure. This opens up the possibility that people with unusually high levels of hunter-gatherer ancestry lived on the Great Hungarian Plain throughout the Neolithic, and the sampling by Gamba et al. was too patchy to find them.

However, it's not possible to get a genome like BR1 simply by mixing CO1 with KO1, because the hunter-gatherer-like sample is not eastern enough. In other words, it lacks ANE. I know this just by eyeballing a couple of PCA, featuring KO1 and Motala12, a Scandinavian sample estimated by Lazaridis et al. to have a ratio of ~19% ANE (see here and here).

So there might well have been a resurgence in local hunter-gatherer DNA on the Great Hungarian Plain, and perhaps throughout much of Central Europe, after the Neolithic. Nevertheless, in my opinion this alone cannot explain the results in this case.

Shared drift stats of the form f3(Mbuti;BR1,Test) - Eurogenes dataset

Shared drift stats of the form f3(Mbuti;BR1,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

IR1, the Iron Age genome, is clearly mixed. In some ways, much like CO1 and BR1, it's also deceptively similar to present-day Western Europeans, which suggests that it's in large part of local origin. However, its uniparental markers (Y-haplogroup N-M231 and mitochondrial haplogroup G2a1) actually fit better in Siberia than anywhere in Europe, and its genome-wide DNA shows influences from the North Caucasus and Volga-Ural regions (refer to the 4A Oracle results below).

Because of its complex ancestry, I can't accurately estimate the level of ANE admixture in this genome. Nevertheless, the PCA and Eurogenes K15 suggest that it easily surpasses BR1 in this respect. Note, for instance, its position among the Kargopol Russians and North Ossetians on the global PCA plot, as well as its high Eastern Euro score in the Eurogenes K15.

What I think this hints at is that the present levels of ANE across Europe aren't the result of a single early Indo-European migration, but multiple population movements around the continent spanning the entire metal ages, although usually involving Indo-European groups, and the effects of isolation-by-distance.

By the way, IR1 comes from a burial site of the Mezocsat Culture, which is generally accepted to be of Cimmerian origin. The Cimmerians are usually described as a nomadic Indo-European people from the Kuban steppe, just north of the Caucasus, who were pushed west by the expanding Scythians. Apparently, they founded a variety of cultures in the Carpathian Basin and Balkans by imposing themselves as the ruling elite over the locals. It's remarkable how closely IR1's genetic structure fits this narrative.

Shared drift stats of the form f3(Mbuti;IR1,Test) - Eurogenes dataset

Shared drift stats of the form f3(Mbuti;IR1,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

Also, here's a really cool map of Identity-by-Descent (IBD) hits of over 3 cM shared between IR1 and a wide range of present-day populations. It comes from a recent post at Vadim's blog (see here). The shared IBD peaks are found in East Central Europe and the Volga-Ural region, which makes sense.

Sunday, October 26, 2014

Hinxton ancient genomes roundup

Most visitors here are probably aware by now that the Iron Age genomes from Hinxton are the two male samples 1 and 4 (ERS389795 and ERS389798, respectively). You can find confirmation of this at the link below.

Anglo-Saxons left language, but maybe not genes to modern Britons

In regards to the main thrust of the article above, I'm not sure if there's much point discussing whether the British today are mostly of Celtic or Anglo-Saxon stock based on just five ancient genomes from a single location in England. However, if I was told that Hinxton4, the only high coverage genome in this collection, was a modern sample, I'd say it belonged to an Irishman from western Ireland, rather than an Englishman from eastern England.

Thus, unless Hinxton4 was an ancient migrant from Ireland, then it does seem to me as if there was a fairly significant admixture event in England between the indigenous Irish-like Celts and newcomers from the east, which eventually resulted in the present-day English population.

In any case, there are indeed some noticeable differences between the two sets of samples, and these can be visualized by plotting their f3 shared drift statistics on graphs.

For instance, plotting the f3-statistics of Hinxton2, which actually looks like a genome that might belong to someone straight off a boat from the Jutland Peninsula, against those of Hinxtons 1 and 4, we see that the former shares most drift with the Danes. Moreover, the Danes, Swedes and Germans, all Germanic-speakers of course, deviate strongly on both graphs from the lines of slope that run from the Erzya to the Irish. The reason they deviate from these lines is because they don't share enough drift with Hinxtons 1 and 4 compared to the other reference populations from Northwestern Europe, especially the Irish.

A similar pattern can be seen when plotting the average results of Hinxtons 1 and 4 against those of 2, 3 and 5. However, the effect isn't nearly as pronounced, possibly because Hinxtons 3 and 5 are of mixed Celtic/Germanic origin. In fact, I suspect that Hinxton1 is also mixed, and probably has some ancestry from western Scandinavia, but I'll leave that for another time.

See also...

Analysis of an ancient genome from Hinxton

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton3 - ERS389797

Analysis of Hinxton4 - ERS389798

Analysis of Hinxton5 - ERS389799

Friday, October 24, 2014

Analysis of Hinxton5 - ERS389799

Hinxton5, or ERS389799, is one of five ancient English genomes stored at the Sequence Read Archive under accession number ERP003900. However, this analysis is based on the latest genotype file of Hinxton5 available at Genetic Genealogy Tools. For more information and some speculation about these genomes see my earlier blog post here.

Despite its relatively low North Sea score in the Eurogenes K15, and pronounced western shift on the Principal Component Analysis (PCA) plots, this genome appears mostly Germanic. In my opinion, the shared drift stats and also oracle results are quite convincing in this regard. If this were a modern sample it could probably pass for 3/4 north Dutch and 1/4 Irish. By the way, the Sub-Saharan admixture just looks like noise; this is, after all, a low coverage genome.

Shared drift stats of the form f3(Mbuti;Hinxton5,Test) - Eurogenes dataset

Shared drift stats of the form f3(Mbuti;Hinxton5,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

See also...

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton3 - ERS389797

Analysis of Hinxton4 - ERS389798

Hinxton ancient genomes roundup