search this blog

Loading...

Monday, October 20, 2014

Analysis of Hinxton4 - ERS389798


Hinxton4, or ERS389798, is one of five ancient English genomes stored at the Sequence Read Archive under accession number ERP003900. However, this analysis is based on the latest genotype file of Hinxton4 available at Genetic Genealogy Tools. For more information and some speculation about these genomes see my earlier blog post here.

I still don't know who these samples represent exactly, but in all likelihood, this is one of the two Iron Age sequences from this collection, and probably belongs to a Briton of Celtic stock. Note, for instance, its high affinity to the present-day Irish, relatively low North Sea score in the Eurogenes K15, and pronounced western shift on the second Principal Component Analysis (PCA) plot below.

Interestingly, Lithuanians top its shared drift list based on the Human Origins dataset and more than 360K SNPs. I'm not entirely sure what this means, but it's probably related in some way to the unusually high level (>45%) of indigenous European hunter-gatherer ancestry carried by Lithuanians.



Shared drift stats in the form f3(Mbuti;Hinxton4,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;Hinxton4,Test) - Human Origins dataset



Eurogenes K15 4 Ancestors Oracle results

See also...

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton3 - ERS389797

Friday, October 17, 2014

Analysis of Hinxton3 - ERS389797


Hinxton3, or ERS389797, is one of five ancient English genomes stored at the Sequence Read Archive under accession number ERP003900. However, this analysis is based on the latest genotype file of Hinxton3 available at Genetic Genealogy Tools. For more information and some speculation about these genomes see my earlier blog post here.

Despite the exaggerated North Sea score in the Eurogenes K15, Hinxton3 could easily pass for a present-day Briton from the eastern coast of England or Scotland, albeit with a stronger than usual pull towards Scandinavia. Indeed, the f3-statistics show that it shares most genetic drift with the British and Icelanders from Eurogenes and Human Origins, respectively.



Shared drift stats in the form f3(Mbuti;Hinxton3,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;Hinxton3,Test) - Human Origins dataset



Eurogenes K15 4 Ancestors Oracle results

See also...

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton4 - ERS389798

Analysis of Hinxton2 - ERS389796


Hinxton2, or ERS389796, is one of five ancient English genomes stored at the Sequence Read Archive under accession number ERP003900. However, this analysis is based on the latest genotype file of Hinxton2 available at Genetic Genealogy Tools. For more information and some speculation about these genomes see my earlier blog post here.

Interestingly, f3-statistics in the form f3(Mbuti;Hinxton2,Test) show that Hinxton2 shares most genetic drift with present-day Danes and Norwegians. Please refer to the relevant spreadsheets below.



Shared drift stats in the form f3(Mbuti;Hinxton2,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;Hinxton2,Test) - Human Origins dataset



Eurogenes K15 4 Ancestors Oracle results

See also...

Analysis of Hinxton3 - ERS389797

Analysis of Hinxton4 - ERS389798

Sunday, October 12, 2014

Ancient genomes and the calculator effect


Several ancient genomes have been posted online as text files and uploaded to GEDmatch over the last couple of weeks, and many more are likely to follow in the future. A lot of people have already taken this opportunity to analyze these files with various online ancestry tools, usually DIY calculators.

That's actually not a bad way of doing things, as long as everyone's aware that almost all of these calculators produce biased results. They produce biased results because they violate a very basic rule of science, which is this:
Do not test more than one variable at a time.
Obviously, the variable we want to test with these calculators is ancestry. However, when the reference samples are tested in a different way to the test samples, which is what usually happens, then this adds another variable to the proceedings. As a result, we simply can't compare the results of the reference samples to those of the test samples.

I know that a lot of people find this difficult to grasp, and many just seem hell bent on not grasping it. However, anyone who isn't completely insane, and takes five minutes out of their day to try and understand the concepts involved, has to agree that this is a real problem. It can be proven empirically, like I did over two years ago (see here).

I suspect that a lot of confusion has been caused by the fact that the people who were used as reference samples in the making of the various DIY calculators saw highly accurate results when running them, and so assumed everything was fine. The accuracy of the DIY calculators for such people is indeed impressive, and I show that at the link above, but unfortunately the story is very different for everyone else.

Here's the good news: the Eurogenes calculators don't suffer from the calculator effect. That's because the reference samples are treated in the same way as the test samples, so there's only one variable: ancestry. What this means is that when you run a modern or ancient genome with a Eurogenes calculator you can confidently compare the result to those of the reference samples (provided enough SNPs are used), and then be able to make sensible inferences about its genetic origins.

Wednesday, October 8, 2014

Analysis of an ancient genome from Hinxton


I've just added an ancient sample from Hinxton, England, to my burgeoning ancient genomes collection. It's a pre-publication release freely available here as ERS389795. Thanks to Felix C. for breaking the news. We've both called this sample Hinxton1.

Unfortunately, its archeological context is a mystery to me, but it's possibly one of the ancient genomes mentioned in the recent Schiffels et al. ASHG abstract (see here).

In terms of genome-wide genetic structure, Hinxton1 is most similar to present-day Orcadians, Irish, western Scots, Icelanders and western Norwegians, more or less in that order. However, it's fairly distinct from the modern inhabitants of England, or at least those in my datasets, who mostly come from Kent and Cornwall.

Please note, this analysis features two different datasets: Eurogenes and Human Origins. Eurogenes, which is my own dataset, includes more populations than Human Origins, and is based on SNPs used in commercial ancestry and medical work. On the other hand, Human Origins shows a more varied sampling strategy, and is based on SNPs specifically chosen for population genetics.




Shared drift stats in the form f3(Mbuti;Hinxton1,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;Hinxton1,Test) - Human Origins dataset



Eurogenes K15 4 Ancestors Oracle results

See also...

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton3 - ERS389797

Analysis of Hinxton4 - ERS389798

Friday, October 3, 2014

Scratch the North Caucasus


I've just spotted a few interesting extras in the final draft of Lazaridis et al. that appeared in Nature last month, including this quote from page 126 of the freely available supp info:

The finding of high ANE ancestry in the North Caucasus might suggest that the Caucasus is a potential source of this type of ancestry in Europe. However, when we try to fit present-day Europeans as a 3-way mixture of a North Caucasian population+EEF+WHG in the structure of Fig. S14.20 this model is successful for only 5 populations (Bergamo, Bulgarian, Italian_South, Spanish_North, Tuscan using Lezgins as a sister group to the admixing population). Admixture from the Caucasus would need to be substantial to account for observed ANE levels in Europe (e.g., for a European population with ~15% ANE ancestry, almost half of its ancestry must come from a Lezgin-like population with ~29% ANE ancestry; this would account for the ANE ancestry but would greatly dilute its WHG-related ancestry, and yet present-day Europeans have increased affinity to WHG in Extended Data Fig. 4 relative to Stuttgart).

This was rather obvious anyway, but I know that there are a lot of people online who cherish the notion that Europe was invaded in a big way by groups from the Caucasus and/or Anatolia during the Bronze Age, and I'm guessing this paragraph was a response to the comments that the authors received from these people during the public review process.

Indeed, the updated supp info also has a couple of new Principal Component Analyses (PCA) of West Eurasian populations, with which Lazaridis et al. underline the point that most Europeans and Near Easterners form "two discontinuous clines" in such analyses (pages 76-80). I could be wrong, but the impression I get is that they're again communicating how very improbable it is for most Europeans to harbor any Near Eastern and Caucasian admixture that post dates the Neolithic transition.

This of course leaves pre-Turkic far Eastern Europe, Western Siberia and/or Central Asia as the source(s) of the ANE-rich population movements that apparently had such a profound impact on most of the European gene pool after the final Neolithic.

Citation...

Lazaridis et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, Nature, 513, 409–413 (18 September 2014), doi:10.1038/nature13673

See also...

Coming soon: genome-wide data from more than forty 3-9K year-old humans from the ancient Russian steppe

Corded Ware Culture linked to the spread of ANE across Europe

Thursday, September 18, 2014

PCA of Loschbour, Stuttgart, Motala12 and others


Lazaridis et al. has finally made it into a journal. As far as I can see, the version that was published today in Nature doesn't differ in any significant way from the last draft at arXiv (see here). However, now that the paper is officially out, the authors have released their dataset. It can be downloaded here.

Below are a few Principal Component Analyses (PCA) featuring most of the samples, including six ancient genomes, each tested separately. They are, in order of appearance, Loschbour, La Brana-1, Motala12, MA-1 (ie. Malt'ta boy), Stuttgart and Oetzi the Iceman.

The plots look somewhat different from my previous efforts, even those with the same ancient genomes, such as here. That's mainly because this dataset offers more markers (for example, a whopping 427K versus 140K for MA-1), and often much higher quality markers chosen specifically for population genetics. Indeed, all of the plots below are based on around 200K SNPs, and that's after various quality controls. That's an improvement of well over 100K SNPs compared to many of the analyses I ran in the past.

















Citation...

Lazaridis et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, Nature, 513, 409–413 (18 September 2014), doi:10.1038/nature13673

See also...

Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans

Another look at the Lazaridis et al. ancient genomes preprint

Scratch the North Caucasus

The really old Europe is mostly in Eastern Europe