Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes

. 2023 Sep ; 19 (9) : e1010931. [epub] 20230907

Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection

Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid37676865

Grantová podpora
R01 HG012287 NHGRI NIH HHS - United States

f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.

Před aktualizací

PubMed

Zobrazit více v PubMed

Skoglund P, Mathieson I. Ancient genomics of modern humans: The first decade. Annu Rev Genomics Hum Genet. 2018;19: 381–404. doi: 10.1146/annurev-genom-083117-021749 PubMed DOI

Stoneking M, Arias L, Liu D, Oliveira S, Pugach I, Rodriguez JJRB. Genomic perspectives on human dispersals during the Holocene. Proc Natl Acad Sci USA. 2023;120: e2209475119. doi: 10.1073/pnas.2209475119 PubMed DOI PMC

Lipson M, Cheronet O, Mallick S, Rohland N, Oxenham M, Pietrusewsky M, et al.. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science. 2018;361: 92–95. doi: 10.1126/science.aat3188 PubMed DOI PMC

Hajdinjak M, Mafessoni F, Skov L, Vernot B, Hübner A, Fu Q, et al.. Initial Upper Palaeolithic humans in Europe had recent Neanderthal ancestry. Nature. 2021;592: 253–257. doi: 10.1038/s41586-021-03335-3 PubMed DOI PMC

Prüfer K, Posth C, Yu H, Stoessel A, Spyrou MA, Deviese T, et al.. A genome sequence from a modern human skull over 45,000 years old from Zlatý kůň in Czechia. Nat Ecol Evol. 2021;5: 820–825. doi: 10.1038/s41559-021-01443-x PubMed DOI PMC

Skoglund P, Thompson JC, Prendergast ME, Mittnik A, Sirak K, Hajdinjak M, et al.. Reconstructing prehistoric African population structure. Cell. 2017;171: 59–71.e21. doi: 10.1016/j.cell.2017.08.049 PubMed DOI PMC

Loosdrecht M van de, Bouzouggar A, Humphrey L, Posth C, Barton N, Aximu-Petri A, et al.. Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations. Science. 2018;360: 548–552. doi: 10.1126/science.aar8380 PubMed DOI

Prendergast ME, Lipson M, Sawchuk EA, Olalde I, Ogola CA, Rohland N, et al.. Ancient DNA reveals a multistep spread of the first herders into sub-Saharan Africa. Science. 2019;365: eaaw6275. doi: 10.1126/science.aaw6275 PubMed DOI PMC

Lipson M, Ribot I, Mallick S, Rohland N, Olalde I, Adamski N, et al.. Ancient West African foragers in the context of African population history. Nature. 2020;577: 665–670. doi: 10.1038/s41586-020-1929-1 PubMed DOI PMC

Wang K, Goldstein S, Bleasdale M, Clist B, Bostoen K, Bakwa-Lufu P, et al.. Ancient genomes reveal complex patterns of population movement, interaction, and replacement in sub-Saharan Africa. Sci Adv. 2020;6: eaaz0183. doi: 10.1126/sciadv.aaz0183 PubMed DOI PMC

Sirak KA, Fernandes DM, Lipson M, Mallick S, Mah M, Olalde I, et al.. Social stratification without genetic differentiation at the site of Kulubnarti in Christian Period Nubia. Nat Commun. 2021;12: 7283. doi: 10.1038/s41467-021-27356-8 PubMed DOI PMC

Lipson M, Sawchuk EA, Thompson JC, Oppenheimer J, Tryon CA, Ranhorn KL, et al.. Ancient DNA and deep population structure in sub-Saharan African foragers. Nature. 2022;603: 290–296. doi: 10.1038/s41586-022-04430-9 PubMed DOI PMC

Brielle ES, Fleisher J, Wynne-Jones S, Sirak K, Broomandkhoshbacht N, Callan K, et al.. Entwined African and Asian genetic roots of medieval peoples of the Swahili coast. Nature. 2023;615: 866–873. doi: 10.1038/s41586-023-05754-w PubMed DOI PMC

Fu Q, Hajdinjak M, Moldovan OT, Constantin S, Mallick S, Skoglund P, et al.. An early modern human from Romania with a recent Neanderthal ancestor. Nature. 2015;524: 216–219. doi: 10.1038/nature14558 PubMed DOI PMC

Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, et al.. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015;528: 499–503. doi: 10.1038/nature16152 PubMed DOI PMC

Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al.. Ancient admixture in human history. Genetics. 2012;192: 1065–1093. doi: 10.1534/genetics.112.145037 PubMed DOI PMC

Olalde I, Posth C. Latest trends in archaeogenetic research of west Eurasians. Curr Opin Genet Dev. 2020;62: 36–43. doi: 10.1016/j.gde.2020.05.021 PubMed DOI

Rohland N, Mallick S, Mah M, Maier R, Patterson N, Reich D. Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs. Genome Res. 2022;32: 2068–2078. doi: 10.1101/gr.276728.122 PubMed DOI PMC

Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian population history. Nature. 2009;461: 489–494. doi: 10.1038/nature08365 PubMed DOI PMC

Peter BM. Admixture, population structure, and F-statistics. Genetics. 2016;202: 1485–1501. doi: 10.1534/genetics.115.183913 PubMed DOI PMC

Soraggi S, Wiuf C. General theory for stochastic admixture graphs and F-statistics. Theor Popul Biol. 2019;125: 56–66. doi: 10.1016/j.tpb.2018.12.002 PubMed DOI

Peter BM. A geometric relationship of F2, F3 and F4-statistics with principal component analysis. Philos Trans R Soc B Biol Sci. 2022;377: 20200413. doi: 10.1098/rstb.2020.0413 PubMed DOI PMC

Maier R, Flegontov P, Flegontova O, Isildak U, Changmai P, Reich D. On the limits of fitting complex models of population history to f-statistics. eLife. 2023;12: e85492. doi: 10.7554/eLife.85492 PubMed DOI PMC

Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, et al.. A draft sequence of the Neandertal genome. Science. 2010;328: 710–722. doi: 10.1126/science.1188021 PubMed DOI PMC

Durand EY, Patterson N, Reich D, Slatkin M. Testing for ancient admixture between closely related populations. Mol Biol Evol. 2011;28: 2239–2252. doi: 10.1093/molbev/msr048 PubMed DOI PMC

Lipson M. Applying f4-statistics and admixture graphs: Theory and examples. Mol Ecol Resour. 2020;20: 1658–1667. doi: 10.1111/1755-0998.13230 PubMed DOI PMC

Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, et al.. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature. 2015;522: 207–211. doi: 10.1038/nature14317 PubMed DOI PMC

Harney É, Patterson N, Reich D, Wakeley J. Assessing the performance of qpAdm: a statistical tool for studying population admixture. Genetics. 2021;217: iyaa045. doi: 10.1093/genetics/iyaa045 PubMed DOI PMC

Yüncü E, Işıldak U, Williams MP, Huber CD, Vyazov LA, Changmai P et al.. False discovery rates of qpAdm-based screens for genetic ad mixture. bioRxiv. 2023; 2023.04.25.538339. doi: 10.1101/2023.04.25.538339 DOI

Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, et al.. Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020;367: eaay5012. doi: 10.1126/science.aay5012 PubMed DOI PMC

Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al.. Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319: 1100–1104. doi: 10.1126/science.1153717 PubMed DOI

Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, et al.. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505: 43–49. doi: 10.1038/nature12886 PubMed DOI PMC

Prüfer K, Filippo C de, Grote S, Mafessoni F, Korlević P, Hajdinjak M, et al.. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science. 2017;358: 655–658. doi: 10.1126/science.aao1887 PubMed DOI PMC

Mafessoni F, Grote S, Filippo C de, Slon V, Kolobova KA, Viola B, et al.. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc Natl Acad Sci USA. 2020;117: 15132–15136. doi: 10.1073/pnas.2004944117 PubMed DOI PMC

Meyer M, Kircher M, Gansauge M-T, Li H, Racimo F, Mallick S, et al.. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338: 222–226. doi: 10.1126/science.1224344 PubMed DOI PMC

Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, et al.. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016;538: 201–206. doi: 10.1038/nature18964 PubMed DOI PMC

Wang Y, Nielsen R. Estimating population divergence time and phylogeny from single-nucleotide polymorphisms data with outgroup ascertainment bias. Mol Ecol. 2012;21: 974–986. doi: 10.1111/j.1365-294X.2011.05413.x PubMed DOI PMC

Nielsen R, Signorovitch J. Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium. Theor Popul Biol. 2003;63: 245–255. doi: 10.1016/s0040-5809(03)00005-4 PubMed DOI

Nielsen R. Population genetic analysis of ascertained SNP data. Hum Genomics. 2004;1: 218. doi: 10.1186/1479-7364-1-3-218 PubMed DOI PMC

Nielsen R, Hubisz MJ, Clark AG. Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics. 2004;168: 2373–2382. doi: 10.1534/genetics.104.031039 PubMed DOI PMC

Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 2005;15: 1496–1502. doi: 10.1101/gr.4107905 PubMed DOI PMC

Guillot G, Foll M. Correcting for ascertainment bias in the inference of population structure. Bioinformatics. 2009;25: 552–554. doi: 10.1093/bioinformatics/btn665 PubMed DOI

Albrechtsen A, Nielsen FC, Nielsen R. Ascertainment biases in SNP chips affect measures of population divergence. Mol Biol Evol. 2010;27: 2534–2547. doi: 10.1093/molbev/msq148 PubMed DOI PMC

Lachance J, Tishkoff SA. SNP ascertainment bias in population genetic analyses: Why it is important, and how to correct it. BioEssays. 2013;35: 780–786. doi: 10.1002/bies.201300014 PubMed DOI PMC

McTavish EJ, Hillis DM. How do SNP ascertainment schemes and population demographics affect inferences about population history? BMC Genomics. 2015;16: 266. doi: 10.1186/s12864-015-1469-5 PubMed DOI PMC

Malomane DK, Reimer C, Weigend S, Weigend A, Sharifi AR, Simianer H. Efficiency of different strategies to mitigate ascertainment bias when using SNP panels in diversity studies. BMC Genomics. 2018;19: 22. doi: 10.1186/s12864-017-4416-9 PubMed DOI PMC

Geibel J, Reimer C, Weigend S, Weigend A, Pook T, Simianer H. How array design creates SNP ascertainment bias. PLOS ONE. 2021;16: 1–23. doi: 10.1371/journal.pone.0245178 PubMed DOI PMC

Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, et al.. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010;468: 1053–1060. doi: 10.1038/nature09710 PubMed DOI PMC

Chen L, Wolf AB, Fu W, Li L, Akey JM. Identifying and interpreting apparent Neanderthal ancestry in African individuals. Cell. 2020;180: 677–687.e16. doi: 10.1016/j.cell.2020.01.012 PubMed DOI

Hammer MF, Woerner AE, Mendez FL, Watkins JC, Wall JD. Genetic evidence for archaic admixture in Africa. Proc Natl Acad Sci USA. 2011;108: 15123–15128. doi: 10.1073/pnas.1109300108 PubMed DOI PMC

Ragsdale AP, Gravel S. Models of archaic admixture and recent history from two-locus statistics. PLOS Genet. 2019;15: 1–19. doi: 10.1371/journal.pgen.1008204 PubMed DOI PMC

Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019;51: 1321–1329. doi: 10.1038/s41588-019-0484-x PubMed DOI PMC

Durvasula A, Sankararaman S. Recovering signals of ghost archaic introgression in African populations. Sci Adv. 2020;6: eaax5097. doi: 10.1126/sciadv.aax5097 PubMed DOI PMC

Hubisz MJ, Williams AL, Siepel A. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLOS Genet. 2020;16: 1–24. doi: 10.1371/journal.pgen.1008895 PubMed DOI PMC

Ragsdale AP, Weaver TD, Atkinson EG, Hoal EG, Möller M, Henn BM, et al.. A weakly structured stem for human origins in Africa. Nature. 2023;617: 755–763. doi: 10.1038/s41586-023-06055-y PubMed DOI PMC

Kılınç GM, Kashuba N, Koptekin D, Bergfeldt N, Dönertaş HM, Rodríguez-Varela R, et al.. Human population dynamics and Yersinia pestis in ancient northeast Asia. Sci Adv. 2021;7: eabc4587. doi: 10.1126/sciadv.abc4587 PubMed DOI PMC

Yaka R, Mapelli I, Kaptan D, Doğu A, Chyleński M, Erdal ÖD, et al.. Variable kinship patterns in Neolithic Anatolia revealed by ancient genomes. Curr Biol. 2021;31: 2455–2468.e18. doi: 10.1016/j.cub.2021.03.050 PubMed DOI PMC

Oliveira S, Nägele K, Carlhoff S, Pugach I, Koesbardiati T, Hübner A, et al.. Ancient genomes from the last three millennia support multiple human dispersals into Wallacea. Nat Ecol Evol. 2022;6: 1024–1034. doi: 10.1038/s41559-022-01775-2 PubMed DOI PMC

Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLOS Genet. 2012;8: 1–17. doi: 10.1371/journal.pgen.1002967 PubMed DOI PMC

Molloy EK, Durvasula A, Sankararaman S. Advancing admixture graph estimation via maximum likelihood network orientation. Bioinformatics. 2021;37: i142–i150. doi: 10.1093/bioinformatics/btab267 PubMed DOI PMC

Lipson M, Loh P-R, Levin A, Reich D, Patterson N, Berger B. Efficient moment-based inference of admixture parameters and sources of gene flow. Mol Biol Evol. 2013;30: 1788–1802. doi: 10.1093/molbev/mst099 PubMed DOI PMC

Yan J, Patterson N, Narasimhan VM. miqoGraph: fitting admixture graphs using mixed-integer quadratic optimization. Bioinformatics. 2020;37: 2488–2490. doi: 10.1093/bioinformatics/btaa988 PubMed DOI

Nielsen SV, Vaughn AH, Leppälä K, Landis MJ, Mailund T, Nielsen R. Bayesian inference of admixture graphs on Native American and Arctic populations. PLOS Genet. 2023;19: 1–22. doi: 10.1371/journal.pgen.1010410 PubMed DOI PMC

Seguin-Orlando A, Korneliussen TS, Sikora M, Malaspinas A-S, Manica A, Moltke I, et al.. Genomic structure in Europeans dating back at least 36,200 years. Science. 2014;346: 1113–1118. doi: 10.1126/science.aaa0114 PubMed DOI

Narasimhan VM, Patterson N, Moorjani P, Rohland N, Bernardos R, Mallick S, et al.. The formation of human populations in South and Central Asia. Science. 2019;365: eaat7487. doi: 10.1126/science.aat7487 PubMed DOI PMC

Wang C-C, Reinhold S, Kalmykov A, Wissgott A, Brandt G, Jeong C, et al.. Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions. Nat Commun. 2019;10: 590. doi: 10.1038/s41467-018-08220-8 PubMed DOI PMC

Fan S, Kelly DE, Beltrame MH, Hansen MEB, Mallick S, Ranciaro A, et al.. African evolutionary history inferred from whole genome sequence data of 44 indigenous African populations. Genome Biol. 2019;20: 82. doi: 10.1186/s13059-019-1679-2 PubMed DOI PMC

Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, et al.. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014;513: 409–413. doi: 10.1038/nature13673 PubMed DOI PMC

Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, et al.. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature. 2014;514: 445–449. doi: 10.1038/nature13810 PubMed DOI PMC

Pouyet F, Aeschbacher S, Thiéry A, Excoffier L. Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences. eLife. 2018;7: e36317. doi: 10.7554/eLife.36317 PubMed DOI PMC

Lipson M, Reich D. A working model of the deep relationships of diverse modern human genetic lineages outside of Africa. Mol Biol Evol. 2017;34: 889–902. doi: 10.1093/molbev/msw293 PubMed DOI PMC

Flegontov P, Altınışık NE, Changmai P, Rohland N, Mallick S, Adamski N, et al.. Palaeo-Eskimo genetic ancestry and the peopling of Chukotka and North America. Nature. 2019;570: 236–240. doi: 10.1038/s41586-019-1251-y PubMed DOI PMC

Wang C-C, Yeh H-Y, Popov AN, Zhang H-Q, Matsumura H, Sirak K, et al.. Genomic insights into the formation of human populations in East Asia. Nature. 2021;591: 413–419. doi: 10.1038/s41586-021-03336-2 PubMed DOI PMC

Changmai P, Jaisamut K, Kampuansai J, Kutanan W, Altınışık NE, Flegontova O, et al.. Indian genetic heritage in Southeast Asian populations. PLOS Genet. 2022;18: 1–25. doi: 10.1371/journal.pgen.1010036 PubMed DOI PMC

Bergström A, Frantz L, Schmidt R, Ersmark E, Lebrasseur O, Girdland-Flink L, et al.. Origins and genetic legacy of prehistoric dogs. Science. 2020;370: 557–564. doi: 10.1126/science.aba9572 PubMed DOI PMC

Baumdicker F, Bisschop G, Goldstein D, Gower G, Ragsdale AP, Tsambos G, et al.. Efficient ancestry and mutation simulation with msprime 1.0. Genetics. 2021;220: iyab229. doi: 10.1093/genetics/iyab229 PubMed DOI PMC

Fischer A, Pollack J, Thalmann O, Nickel B, Pääbo S. Demographic history and genetic differentiation in apes. Curr Biol. 2006;16: 1133–1138. doi: 10.1016/j.cub.2006.04.033 PubMed DOI

Posth C, Yu H, Ghalichi A, Rougier H, Crevecoeur I, Huang Y, et al.. Palaeogenomics of Upper Palaeolithic to Neolithic European hunter-gatherers. Nature. 2023;615: 117–126. doi: 10.1038/s41586-023-05726-0 PubMed DOI PMC

Martin SH, Amos W. Signatures of introgression across the allele frequency spectrum. Mol Biol Evol. 2020;38: 716–726. doi: 10.1093/molbev/msaa239 PubMed DOI PMC

Bergström A, Stanton DWG, Taron UH, Frantz L, Sinding M-HS, Ersmark E, et al.. Grey wolf genomic history reveals a dual ancestry of dogs. Nature. 2022;607: 313–320. doi: 10.1038/s41586-022-04824-9 PubMed DOI PMC

Librado P, Khan N, Fages A, Kusliy MA, Suchan T, Tonasso-Calvière L, et al.. The origins and spread of domestic horses from the Western Eurasian steppes. Nature. 2021;598: 634–640. doi: 10.1038/s41586-021-04018-9 PubMed DOI PMC

Lefebvre MJM, Daron J, Legrand E, Fontaine MC, Rougeron V, Prugnolle F. Population genomic evidence of adaptive response during the invasion history of Plasmodium falciparum in the Americas. Mol Biol Evol. 2023;40: msad082. doi: 10.1093/molbev/msad082 PubMed DOI PMC

Scally A, Durbin R. Revising the human mutation rate: implications for understanding human evolution. Nat Rev Genet. 2012;13: 745–753. doi: 10.1038/nrg3295 PubMed DOI

Kelleher J, Etheridge AM, McVean G. Efficient coalescent simulation and genealogical analysis for large sample sizes. PLOS Comput Biol. 2016;12: 1–22. doi: 10.1371/journal.pcbi.1004842 PubMed DOI PMC

Nelson D, Kelleher J, Ragsdale AP, Moreau C, McVean G, Gravel S. Accounting for long-range correlations in genome-wide simulations of large cohorts. PLOS Genet. 2020;16: 1–12. doi: 10.1371/journal.pgen.1008619 PubMed DOI PMC

Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4: s13742-015–0047–8. doi: 10.1186/s13742-015-0047-8 PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace