Thursday, September 18, 2014
Lazaridis et al. has finally made it into a journal. As far as I can see, the version that was published today in Nature doesn't differ in any significant way from the last draft at arXiv (see here). However, now that the paper is officially out, the authors have released their dataset. It can be downloaded here.
Below are a few Principal Component Analyses (PCA) featuring most of the samples, including six ancient genomes, each tested separately. They are, in order of appearance, Loschbour, La Brana-1, Motala12, MA-1 (ie. Malt'ta boy), Stuttgart and Oetzi the Iceman.
The plots look somewhat different from my previous efforts, even those with the same ancient genomes, such as here. That's mainly because this dataset offers more markers (for example, a whopping 427K versus 140K for MA-1), and often much higher quality markers chosen specifically for population genetics. Indeed, all of the plots below are based on around 200K SNPs, and that's after various quality controls. That's an improvement of well over 100K SNPs compared to many of the analyses I ran in the past.
Lazaridis et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, Nature, 513, 409–413 (18 September 2014), doi:10.1038/nature13673
Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans
Another look at the Lazaridis et al. ancient genomes preprint
The really old Europe is mostly in Eastern Europe
Friday, September 12, 2014
This is arguably one of the most intriguing abstracts from next month's ASHG 2014 conference:
Insights into British and European population history from ancient DNA sequencing of Iron Age and Anglo-Saxon samples from Hinxton, England. S. Schiffels, W. Haak, B. Llamas, E. Popescu, L. Loe, R. Clarke, A. Lyons, P. Paajanen, D. Sayer, R. Mortimer, C. Tyler-Smith, A. Cooper, R. Durbin.
British population history is shaped by a complex series of repeated immigration periods and associated changes in population structure. It is an open question however, to what extent each of these changes is reflected in the genetic ancestry of the current British population. Here we use ancient DNA sequencing to help address that question. We present whole genome sequences generated from five individuals that were found in archaeological excavations at the Wellcome Trust Genome Campus near Cambridge (UK), two of which are dated to around 2,000 years before present (Iron Age), and three to around 1,300 years before present (Anglo-Saxon period). Good preservation status allowed us to generate one high coverage sequence (12x) from an Iron Age individual, and four low coverage sequences (1x-4x) from the other samples. By providing the first ancient whole genome sequences from Britain, we get a unique picture of the ancestral populations in Britain before and after the Anglo-Saxon immigrations. We use modern genetic reference panels such as the 1000 Genomes Project to examine the relationship of these ancient samples with present day population genetic data. Results from principal component analysis suggest that all samples fall consistently within the broader Northern European context, which is also consistent with mtDNA haplogroups. In addition, we obtain a finer structural genetic classification from rare genetic variants and haplotype based methods such as FineStructure. Reflecting more recent genetic ancestry, results from these methods suggest significant differences between the Iron Age and the Anglo-Saxon period samples when compared to other European samples. We find in particular that while the Anglo-Saxon samples resemble more closely the modern British population than the earlier samples, the Iron Age samples share more low frequency variation than the later ones with present day samples from southern Europe, in particular Spain (1000GP IBS). In addition the Anglo-Saxon period samples appear to share a stronger older component with Finnish (1000GP FIN) individuals. Our findings help characterize the ancestral European populations involved in major European migration movements into Britain in the last 2,000 years and thus provide more insights into the genetic history of people in northern Europe.
So in other words, the Iron Age Britons, presumably of Celtic origin, share inflated levels of rare (ie. low frequency) alleles with Spaniards. Assuming these are pre-Roman samples, and it does seem that way, then the results suggest there were direct genetic ties between the British Celts and Mediterranean populations even before the Romans crossed the channel. I wonder if this is the Bell Beaker pimp juice talking?
Conversely, the Anglo-Saxons are more Finnish-like. But I wouldn't read too much into this result, because Finns are the only northern European population from east of England in the 1000 Genomes project, so they're probably just acting as a proxy for gene flow from the far north of what is now Germany.
Interestingly, these signals aren't all that difficult to pick up in present-day English genomes. Below, for instance, are two sets of Eurogenes K15 ancestry proportions for English samples from Cornwall and Kent, respectively.
Note that both groups are typically Northwest European. However, the English from Cornwall are clearly more West Med, while those from Kent slightly more North Sea, Baltic and Eastern Euro. The West Med component peaks in Sardinia, but also occurs at relatively high frequencies in Iberia, while the North Sea, Baltic and Eastern Euro components are well represented among the Finns.
These differences aren't jaw dropping, but they're certainly noticeable. They also make prefect sense in the light of the ancient genomic data, because Cornwall is arguably one of the regions of the UK least affected by the Anglo-Saxon invasions. Kent, on the other hand, was settled by the Jutes during the 5th century. These people weren't Anglo-Saxons, but nonetheless a very similar Germanic tribe from the Jutland Peninsula.
English from CornwallSee also...
English from Kent
Corded Ware Culture linked to the spread of ANE across Europe
Wednesday, September 10, 2014
A news feature on the Lazaridis et al. preprint has just appeared at Science. The full text is behind a pay wall, but the freely available intro and graphic, of the Corded Ware horizon at its maximum extent, betray the main points of the article.
Three-part ancestry for Europeans: Eurasian “ghost lineage” contributed to most modern European genomes
Lazaridis et al. has been online for almost a year, so it's not exactly breaking news, but the Science feature is actually based on a talk by one of the paper's co-authors, Dr. Johannes Krause, at the recent SMBE 2014 conference in Basel.
The interesting thing is that in both drafts of Lazaridis et al. the authors keep well clear of attributing the post-Neolithic spread of Ancient North Eurasian (ANE) admixture (ie. the Eurasian "ghost lineage") into Western and Central Europe to any specific archeological culture or linguistic group. But according to the Science article, Krause thinks that the Corded Ware Culture (CWC) might have been responsible. Indeed, the article adds that Dr. Wolfgang Haak expressed the same opinion in another SMBE talk.
Keep in mind that Haak has already published a paper on uniparental markers from CWC remains (see here). So perhaps he wasn't just speculating that CWC people pushed ANE deep into Europe? Maybe he already knew after sequencing a CWC genome? Or not, but in any case, we're certainly due for an ancient genome from the critically important late Neolithic/early Bronze Age period of European prehistory.
Update 11/09/2014: It looks like my hunch was right. Haak and others have managed to sequence genome-wide data from CWC skeletons, and a paper is in the works. The authors are presenting their findings at the ASHG 2014 conference next month.
Capture of 390,000 SNPs in dozens of ancient central Europeans reveals a population turnover in Europe thousands of years after the advent of farming. I. Lazaridis, W. Haak, N. Patterson, N. Rohland, S. Mallick, B. Llamas, S. Nordenfelt, E. Harney, A. Cooper, K. W. Alt, D. Reich.
To understand the population transformations that took place in Europe since the early Neolithic, we used a DNA capture technique to obtain reads covering ~390 thousand single nucleotide polymorphisms (SNPs) from a number of different archaeological cultures of central Europe (Germany and Hungary). The samples spanned the time period from 7,500 BP to 3,500 BP (Early Neolithic to Early Bronze Age periods) and most of them were previously studied using mtDNA (Brandt, Haak et al., Science, 2013). The captured SNPs include about 360,000 SNPs from the Affymetrix Human Origins Array that were discovered in African individuals, as well as about 30,000 SNPs chosen for other reasons (that are thought to have been affected by natural selection, or to have phenotypic effects, or are useful in determining Y-chromosome haplogroups). By analyzing this data together with a dataset of 2,345 present-day humans and other published ancient genomes, we show that late Neolithic inhabitants of central Europe belonging to the Corded Ware culture were not a continuation of the earlier occupants of the region. Our results highlight the importance of migration and major population turnover in Europe long after the arrival of farming.
Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans
Ancient North Eurasian (ANE) admixture across Europe & Asia
Corded Ware people: more versatile and healthier than Neolithic farmers
Wednesday, September 3, 2014
A preprint at bioRxiv reports on ancient DNA from the Balkans and Carpathian Basin. STA stands for Starčevo Culture, LBK for Linearbandkeramik Culture, and LBKT for LBK in Transdanubia.
The haplotype of the Mesolithic skeleton from the Croatian Island Korčula belongs to the mtDNA haplogroup U5b2a5 (Dataset S3). The sub-haplogroup U5b has been shown to be frequent in pre-Neolithic hunter-gatherer communities across Europe [28–30,32,33,45,46]. Contrary to the low mtDNA diversity reported from hunter-gatherers of Central/North Europe [28–30], we identify substantially higher variability in early farming communities of the Carpathian Basin including the haplogroups N1a, T1, T2, J, K, H, HV, V, W, X, U2, U3, U4, and U5a (Table 1). Previous studies have shown that haplogroups N1a, T2, J, K, HV, V, W and X are most characteristic for the Central European LBK and have described these haplogroups as the mitochondrial ʻNeolithic packageʼ that had reached Central Europe in the 6th millennium BC [36,37]. Interestingly, most of these haplogroups show comparable frequencies between the STA, LBKT and LBK, comprising the majority of mtDNA variation in each culture (STA=86.36%, LBKT=61.54%, LBK=79.63%). In contrast, hunter-gatherer haplogroups are rare in the STA and both LBK groups (Table 1).So nothing really surprising there, as far as I can see. Y-haplogroup G2a has been found in plenty of other European Neolithic remains, so its status as the main Y-haplogroup of the men who introduced the Neolithic package from the Near East into Europe remains unchallenged. The I1 and I2 probably belonged to the descendants of indigenous European foragers who were incorporated into the ranks of the early farmers as they made their way north from Anatolia.
Three STA individuals belong to the NRY haplogroup F* (M89) and two specimens can be assigned to the G2a2b (S126) haplogroup, and one each to G2a (P15) and I2a1 (P37.2) (Dataset S3, S5). The two investigated LBKT samples carry haplogroups G2a2b (S126) and I1 (M253). Furthermore, the incomplete SNP profiles of eight specimens potentially belong to the same haplogroups; STA: three G2a2b (S126), two G2a (P15), and one I (M170); LBKT: one G2a2b (S126) and one F* (M89) (Dataset S5).
But the I1-M253 is interesting, because this is the first time that this haplogroup has been reported from prehistoric remains. I probably don't have to remind anyone that it's the most common paternal marker in Scandinavia today, and yet it was missing from two sets of bones belonging to Scandinavian foragers analyzed recently by Lazaridis et al. and Skoglund et al. This perhaps suggests that it did not enter Scandinavia until the Neolithic, or even later.
Anna Szécsényi-Nagy, Guido Brandt, Victoria Keerl, et al., Tracing the genetic origin of Europe's first farmers reveals insights into their social organization, bioRxiv, first posted online September 3, 2014, doi: http://dx.doi.org/10.1101/008664.
Tuesday, August 26, 2014
Whose results are these? Feel free to post your guesses in the comments section below. I'll reveal the answer and make the sample available online in a couple of days.
Eurogenes K15 results
4 Ancestors Oracle results
1 MA-1+Tabassaran+Tabassaran+Tabassaran @ 7.771513
2 Kalash+MA-1+Tabassaran+Tabassaran @ 7.785069
3 Lezgin+MA-1+Tabassaran+Tabassaran @ 7.960974
4 Kalash+Lezgin+MA-1+Tabassaran @ 7.96793
5 Kalash+Kalash+MA-1+Tabassaran @ 8.119039
Update 27/08/2014: OK, the sample is a composite of two Lezgins, a people from the Northeast Caucasus, and two Ancient North Eurasian (ANE) genomes from Upper Paleolithic Siberia: Mal'ta boy or MA-1 and Afontova Gora-2 or AG-2. It can be downloaded here.
I chose these two Lezgins because they showed higher than average levels of ANE ancestry (well over 30% in most tests). Basically, I wanted to see where a Lezgin-like individual with unusually high ANE, as well as a dab of WHG, would land on a Principal Component Analysis (PCA) or genetic map of West Eurasia. That's because I now believe that a population like this played a key role in the formation of the modern European gene pool during the early metal ages.
My rough estimate is that the composite genome is around 50% ANE, around 40% early European farmer (EEF), and a few per cent Western European hunter-gatherer (WHG). For a detailed description of these three ancestral components see here.
The outcome is very interesting, because it puts the composite more or less between the Maris and North Caucasians, which roughly translates to the Russo-Kazakh border. This is an area generally accepted to be part of the Proto-Indo-European (PIE) homeland, and fits with a recent suggestion that populations expanding from this region after the Neolithic might be responsible for the widespread occurrence of ANE across Europe today (see here).
However, formal statistics, rather than PCA, are the favored method for studying ancient genomes in scientific literature. So I thought I'd run f3 and D-statistics to see whether this composite was indeed the closest thing to a PIE individual in my dataset.
I picked a set of French samples as the test group, and chose French Basques as the main reference group, alongside the composite and a variety of populations that are documented or suspected of carrying high levels of ANE. The assumption I made was that the French used to be like the French Basques, their non-Indo-European neighbors, before someone pushed in from the east and changed both their language and genomes.
The results can be seen in the spreadsheet below. Please note, if the f3-statistic is negative, then the target group is assumed to be admixed. Moreover, if the D-statistic Z-score is positive, then the gene flow occurred either between W and Y or X and Z. If the Z-score is negative, then the gene flow occurred either between W and Z or X and Y.
Sunday, August 24, 2014
PLoS ONE has a new paper on the genetic structure of Western Balkan populations. Here's the abstract:
Contemporary inhabitants of the Balkan Peninsula belong to several ethnic groups of diverse cultural background. In this study, three ethnic groups from Bosnia and Herzegovina - Bosniacs, Bosnian Croats and Bosnian Serbs - as well as the populations of Serbians, Croatians, Macedonians from the former Yugoslav Republic of Macedonia, Montenegrins and Kosovars have been characterized for the genetic variation of 660 000 genome-wide autosomal single nucleotide polymorphisms and for haploid markers. New autosomal data of the 70 individuals together with previously published data of 20 individuals from the populations of the Western Balkan region in a context of 695 samples of global range have been analysed. Comparison of the variation data of autosomal and haploid lineages of the studied Western Balkan populations reveals a concordance of the data in both sets and the genetic uniformity of the studied populations, especially of Western South-Slavic speakers. The genetic variation of Western Balkan populations reveals the continuity between the Middle East and Europe via the Balkan region and supports the scenario that one of the major routes of ancient gene flows and admixture went through the Balkan Peninsula.
Among the most eye catching figures from the study is this TreeMix graph with ten migration edges or admixture events. Note the 44% migration edge running from the base of the Eastern European branch to the French. Is this perhaps a legacy of the Proto-Celts and early Germanics? In any case, something similar can be seen on this TreeMix graph from the supplementary PDF to Skoglund et al. 2014, where a French genome is modeled as a clade closely related to Upper Paleolithic Siberian forager MA-1, but with considerable Sardinian admixture.
Also, the position of the Poles at the tip of the tree, and thus near the North Russians, is somewhat curious. However, I know that several of these individuals are ethnic Poles from Estonia, so that might be the problem.
Update 25/08/2014: Here's a typical Eurogenes Principal Component Analysis (PCA) of West Eurasia with the new samples from this paper (Bosnians, Kosovars, Macedonians, Montenegrins and Serbs).
Kovacevic L, Tambets K, Ilumäe A-M, Kushniarevich A, Yunusbayev B, et al. (2014) Standing at the Gateway to Europe - The Genetic Structure of Western Balkan Populations Based on Autosomal and Haploid Markers. PLoS ONE 9(8): e105090. doi:10.1371/journal.pone.0105090
Tuesday, August 19, 2014
There's an intriguing new paper at the AJHB on the paternal ancestry of a population from Iron Age China. It argues that the Han Chinese are the result of fairly recent admixture events, with Y-chromosome haplogroup Q1a1 entering the ancestral territory of the Han, the Central Plain of China, only around 3,000 years ago from the northwest. It's probably a sign of things to come, not only for the Han but many populations generally thought to be genetically homogeneous.
Note also how the Y-chromosome haplogroups appear to be associated with different burial customs and inferred social status. Q1a1 was found in the remains of three aristocrats and eight commoners, most of them buried in the extended prostrate position typical of Bronze and Iron Age steppe nomads of what is now western China. Most of the other remains were buried in the extended supine position, characteristic of the populations of the Chinese Central Plain at the time. I've put the details into a spreadsheet here.
It'll be interesting to learn about the genome-wide genetic structure of the people who introduced haplogroup Q1a1 into the ancestral Han gene pool. Were they perhaps in large part of Ancient North Eurasian (ANE) origin? The reason I say this is because Q is the most common Y-chromosome haplogroup in the Americas, where ANE peaks today. It's also the sister clade of haplogroup R, which is the paternal marker of Mal'ta boy, or the MA-1 genome, the main reference sample for ANE.
Indeed, haplogroup R was expanding in a big way across Europe and West and Central Asia at about the same time as Q1a1 in China. It also probably came from the steppe and was in all likelihood associated with the spread of ANE deep into Europe.
Objectives: Y chromosome haplogroup Q1a1 is found almost only in Han Chinese populations. However, it has not been found in ancient Han Chinese samples until now. Thus, the origin of haplogroup Q1a1 in Han Chinese is still obscure. This study attempts to provide answer to this question, and to uncover the origin and paternal genetic structure of the ancestors of the Han Chinese.
Methods: Eighty-nine ancient human remains that were excavated from the presumed geographic source of the Han Chinese and dated to approximately 3,000 years ago were treated by the amelogenin gene polymerase chain reaction test, to determine their sex. Then, Y chromosome single nucleotide polymorphisms were subsequently analyzed from the samples detected as male.
Results: Samples from 27 individuals were successfully amplified. Their haplotypes could be attributed to haplogroups N, O*, O2a, O3a, and Q1a1. Analyses showed that the assigned haplogroup of each sample is correlated to the suspected social status and observed burial custom associated with the sample.
Conclusions: The origins of the observed haplotypes and their distribution in present day Han Chinese and in the samples suggest that haplogroup Q1a1 was probably introduced into the Han Chinese population approximately 3,000 years ago.
Yong-Bin Zhao et al., Ancient DNA evidence reveals that the Y chromosome haplogroup Q1a1 admixed into the Han Chinese 3,000 years ago, American Journal of Human Biology, Article first published online: 18 AUG 2014, DOI: 10.1002/ajhb.22604
Lots of ancient Y-DNA from China
First genome of an Upper Paleolithic human (Mal'ta boy)
Ancient European admixture in the Americas, or ancient Amerindian admixture in Europe?
Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans