Detail
Article
Online article
FT
Medvik - BMC
  • Something wrong with this record ?

Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes

P. Flegontov, U. Işıldak, R. Maier, E. Yüncü, P. Changmai, D. Reich

. 2023 ; 19 (9) : e1010931. [pub] 20230907

Language English Country United States

Document type Journal Article, Research Support, N.I.H., Extramural, Research Support, Non-U.S. Gov't

Grant support
R01 HG012287 NHGRI NIH HHS - United States

f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc23016266
003      
CZ-PrNML
005      
20231026110057.0
007      
ta
008      
231013s2023 xxu f 000 0|eng||
009      
AR
024    7_
$a 10.1371/journal.pgen.1010931 $2 doi
035    __
$a (PubMed)37676865
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxu
100    1_
$a Flegontov, Pavel $u Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America $u Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia $u Kalmyk Research Center of the Russian Academy of Sciences, Elista, Russia $1 https://orcid.org/0000000197594981
245    10
$a Modeling of African population history using f-statistics is biased when applying all previously proposed SNP ascertainment schemes / $c P. Flegontov, U. Işıldak, R. Maier, E. Yüncü, P. Changmai, D. Reich
520    9_
$a f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.
650    _2
$a zvířata $7 D000818
650    _2
$a lidé $7 D006801
650    _2
$a černoši $x genetika $7 D044383
650    _2
$a mapování chromozomů $7 D002874
650    _2
$a genotyp $7 D005838
650    _2
$a neandertálci $x genetika $7 D059125
650    12
$a fylogeneze $7 D010802
650    12
$a jednonukleotidový polymorfismus $x genetika $7 D020641
650    12
$a Afričané $x genetika $7 D000094842
650    12
$a demografie $x dějiny $7 D003710
650    _2
$a biologická variabilita populace $x genetika $7 D000073537
650    _2
$a statistické modely $7 D015233
650    _2
$a zkreslení výsledků (epidemiologie) $7 D015982
655    _2
$a časopisecké články $7 D016428
655    _2
$a Research Support, N.I.H., Extramural $7 D052061
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Işıldak, Ulaş $u Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia $1 https://orcid.org/0000000164976254
700    1_
$a Maier, Robert $u Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
700    1_
$a Yüncü, Eren $u Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
700    1_
$a Changmai, Piya $u Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czechia
700    1_
$a Reich, David $u Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America $u Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America $u Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, United States of America $u Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America $1 https://orcid.org/0000000270375292 $7 jo20191025226
773    0_
$w MED00008920 $t PLoS genetics $x 1553-7404 $g Roč. 19, č. 9 (2023), s. e1010931
856    41
$u https://pubmed.ncbi.nlm.nih.gov/37676865 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20231013 $b ABA008
991    __
$a 20231026110051 $b ABA008
999    __
$a ok $b bmc $g 2000029 $s 1202628
BAS    __
$a 3
BAS    __
$a PreBMC-MEDLINE
BMC    __
$a 2023 $b 19 $c 9 $d e1010931 $e 20230907 $i 1553-7404 $m PLoS genetics $n PLoS Genet $x MED00008920
GRA    __
$a R01 HG012287 $p NHGRI NIH HHS $2 United States
LZP    __
$a Pubmed-20231013

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...