search this blog


Tuesday, July 22, 2014

f3-stats: 100 present-day populations plus MA-1

Here's a list of f3-statistics featuring 100 present-day populations, mostly from Eurasia, and MA-1, the now famous 24,000 year-old genome from south Siberia. The test was based on 125K SNPs and run with the threepop program, which is part of the TreeMix package.

Threepop stats (8MB download)

f3-statistics are used to confirm admixture; if the f3 ratio is significantly negative, then the test group is considered to be admixed.

Interestingly, the lowest f3-statistic in most cases involves MA-1. That's probably because it's the only ancient genome in this dataset, but very likely also because Ancient North Eurasian (ANE) groups related to MA-1 contributed considerable gene flow to the vast majority of populations across Eurasia and the Americas.

Please note, however, that I organized each population by its Z-scores, from lowest to highest. I thought these looked more interesting in most cases than the f3-stats because they appeared to point to more recent admixture events.

Unfortunately, the thing about f3 tests is that they're limited to just two reference groups, when very often three or four references would be more helpful. Nevertheless, the results are still very useful if you actually know what to do with them. For instance, below is a list of shared drift statistics between the aforementioned present-day populations and MA-1, of the form of f3(Mbuti;MA-1,Test). The Karitiana Indians of the Amazon easily top this list, followed by various Siberian groups and then Northeast Europeans. This makes good sense based on everything we've seen to date.

Shared drift with MA-1 (spreadsheet)

Below is a graph of f3(Mbuti;MA-1,Test) against f3(Mbuti;Dai,Test), which I think is a useful way to compare ANE versus ENA (eastern non-African or just East Eurasian) influence across Eurasia. The datasheet, which is in PAST format, including the full key can be donwnloaded here.

Note how the West Eurasians form a sloping cline that runs from the Arabian Peninsula to the Eastern Baltic. Keep in mind, however, that the MA-1 statistic inflates the Dai statistic, which is why the cline is a sloping one, and only the samples that clearly deviate from this cline towards the right harbor ENA admixture. Moreover, the so called Basal Eurasian and Sub-Saharan ancestry depress both statistics, which is probably why, for instance, the Makrani, who are documented to be of partly Sub-Saharan origin, are outliers within the Central Asian sample set.

Despite the fact that Northeast Europeans are at the top of the graph, they don't actually carry the most significant levels of ANE. That's because their affinity to MA-1 is in large part mediated via so called Western European Hunter-Gatherer (WHG) ancestry. So if we assume that WHG is not found in Asia, as per Lazaridis et al. 2013, then it's the Kalash and Lezgins of the Hindu Kush and Northeast Caucasus, respectively, who come out the most ANE here.

Saturday, July 12, 2014

TreeMix stuff: looking for MA-1 related ancestry in the Hindu Kush

I finally got around to installing TreeMix. It's a cool little program for drawing up phylogenetic trees and inferring migration edges or admixture events from genome-wide allele frequency data. It seems to be popular with scientists working with ancient DNA and has featured recently in several important papers on ancient genomes.

I'm still learning how to use it and interpret the output, but with that in mind, here are some TreeMix graphs from tests designed to look for MA-1 or Ancient North Eurasian (ANE) related ancestry among the Kalash of the Hindu Kush. This is an interesting issue, considering the isolated nature of the Kalash, the archaic form of Indo-Iranian spoken by them, and the recent talk of ANE potentially being a genetic signal of the Proto-Indo-Europeans (see here).

The values for the migration edges shown in each tree can be downloaded here. The trees are unsupervised except for the San bushmen being specified each time as the outgroup. The first tree was produced with 130K SNPs and the rest with a higher quality subset of 30K SNPs (at read depth of 3x or above in the MA-1 genome). I found that using the higher quality markers kept the MA-1 branch at a respectable length and also made for more sensible trees when running with five or more migration edges.

Based on these results I'd say that the Kalash do harbor significant MA-1 related ancestry. The first tree, featuring 18 populations and four migration edges, shows that it might be as high as 43%.

However, after the extra populations and migration edges are added only the Karitiana Indians of the Amazon and Evens of Siberia show direct signals of admixture from the branch leading to MA-1. The Kalash, on the other hand, receive admixture from points located near the root of the MA-1 branch. I can't say with any great certainty what this means, but I suspect it shows that ANE arrived in the Hindu Kush in different waves and in a mixed form.

Interestingly, when the Turkic-speaking Chuvash are added they're shown to be partly East Eurasian, with 20-24% admixture from the ancestors of the Evens. This makes sense, because the early Turks are thought to have originated somewhere east of the Altai, possibly in south Siberia.

By the way, I have already looked at the issue of ANE ancestry among present-day Asians using the ADMIXTURE software, and came up with a figure of 33.5% for a Kalash individual (see here). At the moment, I'm not sure whether my TreeMix results invalidate this figure? Any thoughts?

Update 20/07/2014: The TreeMix package also offers a threepop test, which computes f3 statistics of the form of f3(Test;Reference1,Reference2). The test population is considered to be admixed if the f3 statistic is significantly negative. I took advantage of this option to look for admixture in 42 Eurasian populations, including the Kalash and Chamar, plus MA-1, using just over 127K SNPs.

None of the f3 stats for the Kalash proved to be significant, while the Chamar were just barely confirmed as a mixture between the Sakilli of South India and MA-1. However, that's probably because the threepop test has a hard time dealing with genetic isolates and unusually endogamous groups, like the Kalash and Chamar, respectively. Below are their ten lowest f3 stats, along with the standard errors and Z scores.

Kalash;MA-1,Samaritan 0.00487064 0.000664486 7.32994
Kalash;MA-1,Armenian 0.00547845 0.000401906 13.6312
Kalash;MA-1,Georgian 0.00553588 0.000434261 12.7478
Kalash;Sardinian,MA-1 0.00557093 0.000490111 11.3667
Kalash;Abhkasian,MA-1 0.00621502 0.000421821 14.7338
Kalash;MA-1,Kurdish 0.00631716 0.000473973 13.3281
Kalash;MA-1,Iranian 0.0065704 0.000427014 15.3868
Kalash;Georgian,Dai 0.00702268 0.000239 29.3836
Kalash;North_Ossetian,MA-1 0.0070735 0.000460335 15.366
Kalash;Dai,Armenian 0.00723076 0.000233725 30.937

Chamar;Sakilli,MA-1 -0.000333833 0.000487363 -0.684979
Chamar;Dai,Armenian 0.000146969 0.000240917 0.610039
Chamar;Makrani,Dai 0.00020635 0.000211436 0.975946
Chamar;Punjabi_Jat,Dai 0.000217682 0.000260224 0.836519
Chamar;Balochi,Dai 0.000229282 0.00021638 1.05963
Chamar;Georgian,Dai 0.000260187 0.000251537 1.03439
Chamar;Dai,Samaritan 0.000304204 0.000340771 0.892692
Chamar;Brahui,Dai 0.000326537 0.000210435 1.55172
Chamar;Gujarati,Dai 0.000346848 0.000162122 2.13942
Chamar;Dai,Kurdish 0.000404303 0.000267694 1.51032

As expected, all of the Europeans in this run were clearly best characterized as mixtures between Sardinians and MA-1, apart from the Turkic-speaking Chuvash from far Eastern Europe, who were best fitted as a mixture of Lithuanians and the Dai from southern China, and the Sardinians, who were found to be unmixed. The full output from the test can be downloaded at the link below.

Threepop test: 42 populations plus MA-1 (127K SNPs)

Wednesday, July 9, 2014

More ancient genomes from Sweden: Pitted Ware forager Ajvide58 and TRB farm girl Gokhem2

Ajvide58 is a male Neolithic forager from Gotland, dated to 4,900-4,600 cal. years B.P., belonging to the Pitted Ware culture, and carrying Y-chromosome haplogroup I2 (most likely I2a1) and mitochondrial (mtDNA) haplogroup U4d. Gokhem2 is a female Neolithic farmer from mainland southern Sweden, dated to 5,050-4,750 cal. B.P., belonging to the Trichterbecher Kultur (TRB culture), and carrying mtDNA haplogroup H1c. Both of these genomes were published earlier this year by Skoglund et al. 2014 (see here).

My analysis shows that Ajvide58 is very similar to Mesolithic Swedish forager StoraFörvar11 (see here), and also in part Ancient North Eurasian (ANE). This can be seen in the 4 Ancestors Oracle results in which one of the best fits for Ajvide58 is 3/4 Iberian Mesolithic forager La Brana-1 and 1/4 Upper Paleolithic Siberian forger and ANE proxy MA-1.

However, the Eurogenes K15 ancestry proportions suggest to me that the level of ANE in this sample is lower than in StoraFörvar11. That's because Ajvide58 shows less of the Eastern European component (16.87% vs. 23.23%), and none of the South Asian component. These two components, along with the Amerindian component, dominate MA-1's K15 results (see here).

On the other hand, Gokhem2 appears not to harbor any ANE ancestry; note the complete lack of the Eastern Euro, Amerindian and South Asian components in her K15 proportions, and absence of MA-1 in the Oracle results. This is in line with all scientific literature to date, which indicates that ANE was basically missing from Western and Central Europe during the Mesolithic and Neolithic. Indeed, this sample's best matching population in the Oracle are the Sardinians, one of the few present-day European groups without any detectable ANE admixture.

The absence of ANE in Gokhem2 and all other ancient European genomes from a farming context, like Stuttgart and Oetzi, is a very important point. That's because Neolithic farmers largely replaced indigenous hunter-gatherers across most of Europe, including in Scandinavia. As a result, it's probably safe to assume that this process reduced the amount of ANE in Scandinavia to much less than what was carried there by the indigenous foragers (15-19%). However, present-day Scandinavians carry around 17% of ANE, which must mean that there was another migration wave into Northern Europe after the Neolithic, coming from an area rich in ANE. This was probably the Indo-European expansion from the middle Volga region (see here).

Nevertheless, Gokhem2 does have forager admixture, which can be seen in her non-trivial levels of Eurogenes K15 components associated with indigenous European forager ancestry: North Sea 12.65%, Southeast Asian 5.22%, Baltic 5.06%, Oceanian 4% and Siberian 2.3%. What this suggests is that the admixture event between the Near Eastern and European ancestors of the TRB farmers didn't take place in Scandinavia, but rather somewhere on the European mainland where ANE wasn't present at the time. Again, the Oracle results are in agreement, because they feature La Brana-1 well ahead of Ajvide58.

Eurogenes K15 for Ajvide58

North_Sea 31.02
Atlantic 13.58
Baltic 23.11
Eastern_Euro 16.87
West_Med 0.38
West_Asian 0
East_Med 0
Red_Sea 0
South_Asian 0
Southeast_Asian 4.44
Siberian 2.1
Amerindian 5.8
Oceanian 2.39
Northeast_African 0.31
Sub-Saharan 0

4 Ancestors Oracle results (with StoraFörvar11)

4 Ancestors Oracle results (without StoraFörvar11)

Principal Component Analyses (PCA) featuring West Eurasian, Eurasian and global reference sets, respectively, show that Ajvide58 is outside the range of modern West Eurasian genetic variation, which is in line with the results of all other ancient European foragers sequenced to date. The cross marks the spot (click on the images to download high resoution PDFs of the plots):

Eurogenes K15 for Gokhem2

North_Sea 12.65
Atlantic 21.49
Baltic 5.06
Eastern_Euro 0
West_Med 38.42
West_Asian 0
East_Med 8.19
Red_Sea 2.47
South_Asian 0
Southeast_Asian 5.22
Siberian 2.3
Amerindian 0
Oceanian 4
Northeast_African 0.21
Sub-Saharan 0

4 Ancestors Oracle results for Gokhem2

The PCA basically show the same outcomes, with the TRB farm girl positioned just north of present-day Sardinians on the West Eurasian plot, between the Near East and Northern Europe on the Eurasian plot, and with Lithuanians on the global plot.

The Eurogenes K15 and Alexandr Burnashev's 4 Ancestors Oracle are available for use free of charge at GEDmatch for anyone with genotype data from 23andMe and similar personal genomics companies. Look for the Ad-mix option and then the Eurogenes tab.

See also...

PCA of five ancient genomes

4 Ancestors Oracle results for Anzick-1, La Brana-1 and MA-1

Saturday, July 5, 2014

Analysis of Mesolithic Swedish forager StoraFörvar11

StoraFörvar11, or SfF11, is a late Mesolithic genome from a cave on the small island of Stora Karlsö, just off the coast of Gotland. It was published earlier this year by Skoglund et al. along with several other ancient genomes dating to the Neolithic from Gotland and mainland Sweden (see here). Belonging to Northeast European-specific mitochondrial haplogroup U5a1, SfF11 appears to be the archytypal Scandinavian forager, with no detectable Neolithic farmer admixture but considerable Ancient North Eurasian (ANE) ancestry related to Upper Paleolithic hunter-gatherers from Siberia, such as MA-1 and AG2 (see here).

Please note, Sf11 was superimposed onto the first Principal Component Analysis (PCA) plot below, which initially only included La Brana-1, the ancient Mesolithic genome from northern Spain, and present-day West Eurasians. I did this to avoid creating a cluster with the two ancient genomes based not on genuine genetic affinities between them but their relatively poor quality. I obtained the PC coordinates for Sf11 from an almost identical 13K SNP PCA plot which can be seen here.

Note also the clear eastern affinity shown by SfF11 relative to La Brana-1, which in all likelihood is the result of the above mentioned shared ANE ancestry with MA-1, featured on the second PCA. To date, all ancient genomes from Western and Central Europe have basically lacked this admixture, while Scandinavian hunter-gatherers carried it at levels of 15-19%. As hypothesized by Lazaridis et al. 2013, it's likely that Eastern European hunter-gatherers harbored even greater levels of ANE, and it's probably a good bet that they introduced it into Scandinavia during and/or before the Mesolithic.

I also ran a couple of PCA with reference samples from across North Eurasia and the globe. Both were based on 14K SNPs.

Below are the Eurogenes K15 ancestry proportions for SfF11, and below that the 4 Ancestors Oracle results. Even though the K15 test was based on just 8K SNPs, the outcome appears robust, and correlates closely with results from more sophisticated formal mixture tests in scientific literature, in which European hunter-gatherers show a strong relationship to present-day East Baltic populations, especially Lithuanians. Moreover, among the best 4-way Oracle fits for SfF11 is 3/4 La Brana-1 and 1/4 MA-1, which is extremely close to the actual genetic structure of Scandinavian foragers: around 80% Western European Hunter-Gatherer (WHG) and around 20% ANE.

The unusually high South and Southeast Asian scores can probably be explained by shared ANE ancestry with South Asians and lack of the so called Basal Eurasian admixture, respectively. Indeed, the latter is a very good bet considering the complete absence of any sort of Mediterranean and Near Eastern signals in these results.

Eurogenes K15

Baltic 29.24
North_Sea 23.97
Eastern_Euro 23.23
Southeast_Asian 5.97
Atlantic 5.62
Amerindian 4.52
South_Asian 4.36
Oceanian 2.17
Northeast_African 0.58
Siberian 0.34
West_Med 0
West_Asian 0
East_Med 0
Red_Sea 0
Sub-Saharan 0

4 Ancestors Oracle

Least-squares method.

Using 1 population approximation:
1 Estonian @ 14.153281
2 Erzya @ 14.620788
3 Kargopol_Russian @ 14.700492
4 Southwest_Russian @ 15.448751
5 Ukrainian @ 15.825631
6 Lithuanian @ 15.842059
7 Ukrainian_Belgorod @ 16.110345
8 East_Finnish @ 16.435534
9 Belorussian @ 16.531115
10 Ukrainian_Lviv @ 16.638975
11 Estonian_Polish @ 16.671571
12 Polish @ 17.379799
13 South_Polish @ 17.805012
14 Russian_Smolensk @ 17.812963
15 Finnish @ 18.279374
16 La_Brana-1 @ 19.903407
17 Southwest_Finnish @ 21.942936
18 Moldavian @ 23.158096
19 Croatian @ 23.266324
20 Hungarian @ 24.020402

Using 2 populations approximation:
1 Erzya+Estonian @ 12.292066
2 Estonian+Kargopol_Russian @ 13.190123
3 Erzya+La_Brana-1 @ 13.192429
4 Erzya+Lithuanian @ 13.414829
5 Erzya+Ukrainian @ 13.440955
6 Erzya+Ukrainian_Lviv @ 13.540859
7 Erzya+Finnish @ 13.602815
8 East_Finnish+Lithuanian @ 13.693698
9 Kargopol_Russian+Lithuanian @ 13.735122
10 Estonian+Southwest_Russian @ 13.994994
11 East_Finnish+Erzya @ 14.077424
12 Estonian+Ukrainian_Belgorod @ 14.113102
13 Kargopol_Russian+Ukrainian @ 14.126683
14 Estonian+Estonian @ 14.153281
15 Belorussian+Erzya @ 14.180946
16 Erzya+Southwest_Russian @ 14.186181
17 Kargopol_Russian+Ukrainian_Lviv @ 14.247527
18 Estonian+Ukrainian @ 14.247854
19 Erzya+Polish @ 14.291491
20 Estonian+Lithuanian @ 14.31161

Using 3 populations approximation:
1 50% Estonian +25% Lithuanian +25% MA-1 @ 11.982448
2 50% Lithuanian +25% Estonian +25% MA-1 @ 12.169832
3 50% Estonian +25% Estonian +25% MA-1 @ 12.225538
4 50% Erzya +25% Estonian +25% La_Brana-1 @ 12.250755
5 50% Erzya +25% Estonian +25% Estonian @ 12.292066
6 50% Lithuanian +25% La_Brana-1 +25% MA-1 @ 12.473574
7 50% Erzya +25% La_Brana-1 +25% Lithuanian @ 12.480595
8 50% Lithuanian +25% Finnish +25% MA-1 @ 12.547096
9 50% Erzya +25% Estonian +25% Ukrainian_Lviv @ 12.657215
10 50% Erzya +25% Estonian +25% Ukrainian @ 12.660239
11 50% Erzya +25% Estonian +25% Lithuanian @ 12.661794
12 50% Estonian +25% Erzya +25% Kargopol_Russian @ 12.679962
13 50% Erzya +25% Erzya +25% La_Brana-1 @ 12.695461
14 50% Erzya +25% La_Brana-1 +25% Ukrainian @ 12.707643
15 50% Estonian +25% Erzya +25% Estonian @ 12.716859
16 50% Erzya +25% Finnish +25% Lithuanian @ 12.72455
17 50% Erzya +25% Estonian +25% Finnish @ 12.737834
18 50% Erzya +25% La_Brana-1 +25% Ukrainian_Lviv @ 12.753404
19 50% Lithuanian +25% Lithuanian +25% MA-1 @ 12.768751
20 50% Estonian +25% Belorussian +25% MA-1 @ 12.780747

Using 4 populations approximation:
1 Estonian+Estonian+Lithuanian+MA-1 @ 11.982448
2 Estonian+Lithuanian+Lithuanian+MA-1 @ 12.169832
3 Estonian+Estonian+Estonian+MA-1 @ 12.225538
4 Erzya+Erzya+Estonian+La_Brana-1 @ 12.250755
5 Erzya+Erzya+Estonian+Estonian @ 12.292066
6 Estonian+La_Brana-1+Lithuanian+MA-1 @ 12.434074
7 La_Brana-1+Lithuanian+Lithuanian+MA-1 @ 12.473574
8 Erzya+Erzya+La_Brana-1+Lithuanian @ 12.480595
9 Finnish+Lithuanian+Lithuanian+MA-1 @ 12.547096
10 Erzya+Erzya+Estonian+Ukrainian_Lviv @ 12.657215
11 Erzya+Erzya+Estonian+Ukrainian @ 12.660239
12 Estonian+Lithuanian+MA-1+Ukrainian @ 12.66118
13 Erzya+Erzya+Estonian+Lithuanian @ 12.661794
14 Erzya+Estonian+Estonian+Kargopol_Russian @ 12.679962
15 Erzya+Erzya+Erzya+La_Brana-1 @ 12.695461
16 Estonian+Lithuanian+MA-1+Ukrainian_Lviv @ 12.697136
17 Erzya+Erzya+La_Brana-1+Ukrainian @ 12.707643
18 Erzya+Estonian+Estonian+Estonian @ 12.716859
19 Erzya+Erzya+Finnish+Lithuanian @ 12.72455
20 Erzya+Erzya+Estonian+Finnish @ 12.737834
21 Estonian+Finnish+Lithuanian+MA-1 @ 12.746305
22 Erzya+Erzya+La_Brana-1+Ukrainian_Lviv @ 12.753404
23 Lithuanian+Lithuanian+Lithuanian+MA-1 @ 12.768751
24 Belorussian+Estonian+Estonian+MA-1 @ 12.780747
25 Estonian+Estonian+MA-1+Ukrainian @ 12.797031
26 Estonian+Estonian+La_Brana-1+MA-1 @ 12.807529
27 Erzya+Estonian+Estonian+Ukrainian @ 12.813496
28 Estonian+Estonian+MA-1+Ukrainian_Lviv @ 12.822931
29 Erzya+Estonian+Kargopol_Russian+La_Brana-1 @ 12.831473
30 Erzya+Estonian+Estonian+Lithuanian @ 12.839613
31 Chuvash+Estonian+Estonian+Lithuanian @ 12.851803
32 Belorussian+Estonian+Lithuanian+MA-1 @ 12.855733
33 Erzya+Estonian+Estonian+Ukrainian_Lviv @ 12.857349
34 East_Finnish+Erzya+Estonian+Lithuanian @ 12.875013
35 Erzya+Estonian+Kargopol_Russian+Lithuanian @ 12.901956
36 Erzya+Estonian+La_Brana-1+Lithuanian @ 12.90565
37 Erzya+Kargopol_Russian+La_Brana-1+Lithuanian @ 12.914481
38 Erzya+Estonian+Estonian+La_Brana-1 @ 12.921321
39 Erzya+Estonian+Estonian+Southwest_Russian @ 12.931952
40 Lithuanian+Lithuanian+MA-1+Ukrainian @ 12.932804

Gaussian method.

Using 1 population approximation:
1 East_Finnish @ 12.111642
2 Finnish @ 12.136433
3 Tatar @ 12.260871
4 Chuvash @ 12.287812
5 Kargopol_Russian @ 13.238854
6 Erzya @ 13.290701
7 Ukrainian @ 14.224517
8 North_Swedish @ 14.501487
9 Mari @ 14.582022
10 La_Brana-1 @ 15.102585
11 Ukrainian_Lviv @ 15.466692
12 Moldavian @ 16.561361
13 Ukrainian_Belgorod @ 16.829215
14 Southwest_Finnish @ 17.044556
15 Southwest_Russian @ 17.644306
16 Estonian_Polish @ 17.912619
17 Swedish @ 18.055712
18 Estonian @ 18.417704
19 Hungarian @ 18.442869
20 Lithuanian @ 18.500045

Using 2 populations approximation:
1 La_Brana-1+Mari @ 9.086839
2 Kargopol_Russian+La_Brana-1 @ 9.216681
3 La_Brana-1+MA-1 @ 9.529079
4 Chuvash+La_Brana-1 @ 9.628936
5 Erzya+La_Brana-1 @ 9.741056
6 La_Brana-1+Tatar @ 10.312023
7 East_Finnish+La_Brana-1 @ 10.369729
8 Chuvash+Estonian @ 10.38245
9 Estonian+La_Brana-1 @ 10.698394
10 Chuvash+Finnish @ 10.701826
11 Estonian+Tatar @ 10.72273
12 Estonian+Shors @ 10.734028
13 Chuvash+Lithuanian @ 10.781409
14 Chuvash+East_Finnish @ 10.832523
15 Chuvash+Kargopol_Russian @ 11.058841
16 Finnish+Tatar @ 11.078731
17 East_Finnish+Tatar @ 11.104768
18 Lithuanian+Shors @ 11.131471
19 Chuvash+Ukrainian @ 11.241182
20 Estonian+Hakas @ 11.257456

Using 3 populations approximation:
1 50% La_Brana-1 +25% Estonian +25% MA-1 @ 6.880967
2 50% La_Brana-1 +25% La_Brana-1 +25% MA-1 @ 7.035486
3 50% La_Brana-1 +25% Lithuanian +25% MA-1 @ 7.1341
4 50% Estonian +25% La_Brana-1 +25% MA-1 @ 7.18973
5 50% La_Brana-1 +25% East_Finnish +25% MA-1 @ 7.57191
6 50% La_Brana-1 +25% Finnish +25% MA-1 @ 7.600389
7 50% Lithuanian +25% La_Brana-1 +25% MA-1 @ 7.628929
8 50% La_Brana-1 +25% Estonian_Polish +25% MA-1 @ 7.697983
9 50% La_Brana-1 +25% Belorussian +25% MA-1 @ 7.70291
10 50% La_Brana-1 +25% Kargopol_Russian +25% MA-1 @ 7.781779
11 50% La_Brana-1 +25% MA-1 +25% Southwest_Finnish @ 7.798672
12 50% La_Brana-1 +25% Erzya +25% MA-1 @ 7.80171
13 50% La_Brana-1 +25% MA-1 +25% Polish @ 7.929863
14 50% La_Brana-1 +25% MA-1 +25% Southwest_Russian @ 7.935151
15 50% La_Brana-1 +25% MA-1 +25% Russian_Smolensk @ 8.031297
16 50% La_Brana-1 +25% MA-1 +25% North_Swedish @ 8.049602
17 50% La_Brana-1 +25% MA-1 +25% Ukrainian_Belgorod @ 8.049701
18 50% La_Brana-1 +25% MA-1 +25% Ukrainian @ 8.06409
19 50% La_Brana-1 +25% MA-1 +25% South_Polish @ 8.188305
20 50% Finnish +25% La_Brana-1 +25% MA-1 @ 8.237496

Using 4 populations approximation:
1 Estonian+La_Brana-1+La_Brana-1+MA-1 @ 6.880967
2 La_Brana-1+La_Brana-1+La_Brana-1+MA-1 @ 7.035486
3 La_Brana-1+La_Brana-1+Lithuanian+MA-1 @ 7.1341
4 Estonian+Estonian+La_Brana-1+MA-1 @ 7.18973
5 Estonian+La_Brana-1+Lithuanian+MA-1 @ 7.414412
6 East_Finnish+La_Brana-1+La_Brana-1+MA-1 @ 7.57191
7 Finnish+La_Brana-1+La_Brana-1+MA-1 @ 7.600389
8 La_Brana-1+Lithuanian+Lithuanian+MA-1 @ 7.628929
9 Estonian+Finnish+La_Brana-1+MA-1 @ 7.689347
10 Estonian_Polish+La_Brana-1+La_Brana-1+MA-1 @ 7.697983
11 Belorussian+La_Brana-1+La_Brana-1+MA-1 @ 7.70291
12 East_Finnish+Estonian+La_Brana-1+MA-1 @ 7.712903
13 Finnish+La_Brana-1+Lithuanian+MA-1 @ 7.779771
14 Kargopol_Russian+La_Brana-1+La_Brana-1+MA-1 @ 7.781779
15 La_Brana-1+La_Brana-1+MA-1+Southwest_Finnish @ 7.798672
16 Erzya+La_Brana-1+La_Brana-1+MA-1 @ 7.80171
17 East_Finnish+La_Brana-1+Lithuanian+MA-1 @ 7.850763
18 Estonian+Estonian_Polish+La_Brana-1+MA-1 @ 7.890161
19 Belorussian+Estonian+La_Brana-1+MA-1 @ 7.906509
20 Estonian+La_Brana-1+MA-1+Southwest_Finnish @ 7.927839
21 La_Brana-1+La_Brana-1+MA-1+Polish @ 7.929863
22 La_Brana-1+La_Brana-1+MA-1+Southwest_Russian @ 7.935151
23 Estonian+Kargopol_Russian+La_Brana-1+MA-1 @ 7.940811
24 Erzya+Estonian+La_Brana-1+MA-1 @ 7.965223
25 La_Brana-1+Lithuanian+MA-1+Southwest_Finnish @ 7.991558
26 La_Brana-1+Lithuanian+MA-1+North_Swedish @ 8.029449
27 La_Brana-1+La_Brana-1+MA-1+Russian_Smolensk @ 8.031297
28 Belorussian+La_Brana-1+Lithuanian+MA-1 @ 8.038993
29 Estonian_Polish+La_Brana-1+Lithuanian+MA-1 @ 8.046271
30 La_Brana-1+La_Brana-1+MA-1+North_Swedish @ 8.049602
31 La_Brana-1+La_Brana-1+MA-1+Ukrainian_Belgorod @ 8.049701
32 La_Brana-1+La_Brana-1+MA-1+Ukrainian @ 8.06409
33 Estonian+La_Brana-1+MA-1+North_Swedish @ 8.075392
34 Estonian+La_Brana-1+MA-1+Polish @ 8.08945
35 Kargopol_Russian+La_Brana-1+Lithuanian+MA-1 @ 8.100132
36 Estonian+La_Brana-1+MA-1+Southwest_Russian @ 8.108852
37 Erzya+La_Brana-1+Lithuanian+MA-1 @ 8.127814
38 Estonian+La_Brana-1+MA-1+Ukrainian @ 8.153751
39 La_Brana-1+Lithuanian+MA-1+Polish @ 8.17359
40 La_Brana-1+La_Brana-1+MA-1+South_Polish @ 8.188305

The Eurogenes K15 and Alexandr Burnashev's 4 Ancestors Oracle are available for use free of charge at GEDmatch for anyone with genotype data from 23andMe and similar personal genomics companies. Look for the Ad-mix option and then the Eurogenes tab.

See also...

PCA of five ancient genomes

4 Ancestors Oracle results for Anzick-1, La Brana-1 and MA-1

Thursday, June 5, 2014

Coming soon: genome-wide data from more than forty 3-9K year-old humans from the ancient Russian steppe

This here is a presentation abstract from the upcoming SMBE 2014 conference. I simply can't wait to see the paper, which I'm guessing will be published very soon.

A central challenge in ancient DNA research is that for many bones that contain genuine DNA, the great majority of molecules in sequencing libraries are microbial. Thus, it has been impractical to carry out whole genome analyses of substantial numbers of ancient individuals. We report a strategy for in-solution capture of ancient DNA from approximately 390,000 single nucleotide polymorphism (SNP) targets, adapting a method of Fu et al. PNAS 2013 who enriched a 40,000 year old DNA sample for the entire chromosome 21. Of the SNPs targets, the vast majority overlap the Affymetrix Human Origins array, allowing us to compare the ancient samples to a database of more than 2,700 present-day humans from 250 groups.

We applied the SNP capture as well as mitochondrial genome enrichment to a series of 65 bones dating to between 3,000-9,000 years ago from the Samara district of Russia in the far east of Europe, a region that has been suggested to be part of the Proto-Indo-European homeland. We successfully extracted nuclear data from 10-90% of targeted SNPs for more than 40 of the samples, and for all of these samples also obtained complete mitochondrial genomes. We report three key findings:

- Samples from the Samara region possess Ancient North Eurasian (ANE) admixture related to a recently published 24,000 year old Upper Paleolithic Siberian genome. This contrasts with both European agriculturalists and with European hunter-gatherers from Luxembourg and Iberia who had little such ancestry (Lazaridis et al. 2013). This suggests that European steppe groups may have been implicated in the dispersal of ANE ancestry across Europe where it is currently pervasive.

- The mtDNA composition of the steppe population is primarily West Eurasian, in contrast with northwest Russian samples of this period (Der Sarkissian et al. PLoS Genetics 2013) where an East Eurasian presence is evident.

- Samara experienced major population turnovers over time: early samples (>6000 years) belong primarily to mtDNA haplogroups U4 and U5, typical of European hunter-gatherers but later ones include haplogroups W, H, T, I, K, J.

We report modeling analyses showing how the steppe samples may relate to ancient and present-day DNA samples from the rest of Europe, the Caucasus, and South Asia, thereby clarifying the relationship of steppe groups to the genetic, archaeological and linguistic transformations of the late Neolithic and Bronze ages.

David Reich et al., Genotyping of 390,000 SNPs in more than forty 3,000-9,000 year old humans from the ancient Russian steppe, SMBE 2014 abstract.

The other really interesting abstract from this conference concerns the Ust-Ishim genome from Upper Paleolithic western Siberia (see here). I'm betting its Y-chromosome haplogroup will be P*, but that's pure speculation on my part.

See also...

Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans

The story of R1a: the academics flounder on

Another look at the Lazaridis et al. ancient genomes preprint

First genome of an Upper Paleolithic human (Mal'ta boy)

Wednesday, June 4, 2014

R1a-Z93 from Bronze Age Mongolia

Four out of the eight Bronze Age Altaian Y-chromosomes in this FSI: Genetics paper belonged to haplogroup R1a-Z93 (frequency map here). Moreover, one of the R1a-Z93 samples carried markers for blue eyes and brown hair, and another for dark blond or brown hair. The single Copper Age (or Eneolithic) individual belonged to Y-chromosome haplogroup Q-M242 and was inferred to have dark hair and eyes.

Interestingly, one of the Bronze Age females belonged to an European-specific haplotype within mitochondrial haplogroup H1b, with five modern matches in Poland and one in Portugal. But this isn't the first time that an ancient European-like population has been detected deep in Asia (see here and here). It's a pity that a full genome or two weren't featured in this study, but hopefully we won't have to wait long for that. Here's the paper abstract:

The Altai Mountains have been a long term boundary zone between the Eurasian Steppe populations and South and East Asian populations. To disentangle some of the historical population movements in this area, 14 ancient human specimens excavated in the westernmost part of the Mongolian Altai were studied. Thirteen of them were dated from the Middle to the End of the Bronze Age and one of them to the Eneolithic period. The environmental conditions encountered in this region led to the good preservation of DNA in the human remains. Therefore, a multi-markers approach was adopted for the genetic analysis of identity, ancestry and phenotype markers. Mitochondrial DNA analyses revealed that the ancient Altaians studied carried both Western (H, U, T) and Eastern (A, C, D) Eurasian lineages. In the same way, the patrilineal gene pool revealed the presence of different haplogroups (Q1a2a1-L54, R1a1a1b2-Z93 and C), probably marking different origins for the male paternal lineages. To go further in the search of the origin of these ancient specimens, phenotypical characters (ie: hair and eye colour) were determined. For this purpose, we adapted the HIrisPlex assay recently described to MALDI-TOF mass spectrometry. In addition, some ancestry informative markers were analyzed with this assay. The results revealed mixed phenotypes among this group confirming the probable admixed ancestry of the studied Altaian population at the Middle Bronze Age. The good results obtained from ancient DNA samples suggest that this approach might be relevant for forensic casework too.

Hollard et al., Strong genetic admixture in the Altai at the Middle Bronze Age revealed by uniparental and ancestry informative markers, Forensic Science International: Genetics, published online 04 June 2014, doi:10.1016/j.fsigen.2014.05.012

Tuesday, May 13, 2014

PCA projection bias in ancient DNA studies

Many Principal Component Analyses (PCA) in papers on ancient genomes clearly suffer from projection bias. However, most people don't seem to understand this problem and the impact it can have on the interpretation of the data.

Here's a demonstration of this effect using two PCA. In the first PCA, La Brana-1, a Mesolithic genome from Iberia, was projected onto the PC eigenvectors computed with modern individuals from the HGDP. However, in the second PCA the ancient genome was run together with these samples. Note the clear difference between the two outcomes.

The second outcome does look a bit strange, but it's actually the correct one, because it's now an established fact that Mesolithic hunter-gatherers, like La Brana-1, were clearly outside the range of modern European, and indeed West Eurasian, genetic variation.

For a technical discussion of this problem, which is also sometimes known as "shrinkage", refer to Lee et al. 2012. To get an idea of the confusion that it can cause, see the discussion in the comments section under my last blog post:

More info on two Thracian genomes from Iron Age Bulgaria + a complaint

The above experiment with La Brana-1 was run with PLINK 2, which is freely available here, using just over 16K SNPs. Only markers with a read depth of 4x or higher were considered, and the marker set was further pruned to account for no-calls (--geno 0.005), LD (--indep-pairwise 200 25 0.4), and minor allele frequency (--maf 0.05).