Most cited article - PubMed ID 34671162
The origins and spread of domestic horses from the Western Eurasian steppes
qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on 2D stepping stone landscapes and in situations with low prestudy odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture graph-shaped and stepping stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: (1) prestudy odds are low and fall rapidly with increasing model complexity; (2) complex migration networks violate the assumptions of the method; hence, there is poor correlation between qpAdm P-values and model optimality, contributing to low but nonzero false-positive rate and low power; and (3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly nonoptimal models have estimates in the same interval, contributing to the false-positive rate. We also reinterpret large sets of qpAdm models from 2 studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: (1) temporal stratification of targets and proxy sources in the case of admixture graph-shaped histories, (2) focused exploration of few models for increasing prestudy odds; and (3) dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false-positive rate.
- Keywords
- qpAdm, admixture graphs, archaeogenetics, genetic admixture, simulation, stepping stone models,
- MeSH
- Humans MeSH
- Models, Genetic * MeSH
- Genetics, Population * methods MeSH
- Software * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between qpAdm p-values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of qpAdm models from two studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; 3) dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.
- Keywords
- admixture graphs, archaeogenetics, genetic admixture, qpAdm, simulation, stepping-stone models,
- Publication type
- Journal Article MeSH
- Preprint MeSH
Our knowledge of human evolutionary history has been greatly advanced by paleogenomics. Since the 2020s, the study of ancient DNA has increasingly focused on reconstructing the recent past. However, the accuracy of paleogenomic methods in resolving questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation remains an open question. We evaluated the performance and behavior of two commonly used methods, qpAdm and the f3-statistic, on admixture inference under a diversity of demographic models and data conditions. We performed two complementary simulation approaches-firstly exploring a wide demographic parameter space under four simple demographic models of varying complexities and configurations using branch-length data from two chromosomes-and secondly, we analyzed a model of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudohaploidization. We observe that population differentiation is the primary factor driving qpAdm performance. Notably, while complex gene flow histories influence which models are classified as plausible, they do not reduce overall performance. Under conditions reflective of the historical period, qpAdm most frequently identifies the true model as plausible among a small candidate set of closely related populations. To increase the utility for resolving fine-scaled hypotheses, we provide a heuristic for further distinguishing between candidate models that incorporates qpAdm model P-values and f3-statistics. Finally, we demonstrate a significant performance increase for qpAdm using whole-genome branch-length f2-statistics, highlighting the potential for improved demographic inference that could be achieved with future advancements in f-statistic estimations.
- Keywords
- f-statistics, aDNA, admixture, ancient DNA, archaeogenetics, paleogenomics, qpAdm,
- MeSH
- Demography MeSH
- Genomics * methods MeSH
- Models, Genetic MeSH
- Paleontology * methods MeSH
- Software MeSH
- Data Accuracy MeSH
- Publication type
- Journal Article MeSH
Horses revolutionized human history with fast mobility1. However, the timeline between their domestication and their widespread integration as a means of transport remains contentious2-4. Here we assemble a collection of 475 ancient horse genomes to assess the period when these animals were first reshaped by human agency in Eurasia. We find that reproductive control of the modern domestic lineage emerged around 2200 BCE, through close-kin mating and shortened generation times. Reproductive control emerged following a severe domestication bottleneck starting no earlier than approximately 2700 BCE, and coincided with a sudden expansion across Eurasia that ultimately resulted in the replacement of nearly every local horse lineage. This expansion marked the rise of widespread horse-based mobility in human history, which refutes the commonly held narrative of large horse herds accompanying the massive migration of steppe peoples across Europe around 3000 BCE and earlier3,5. Finally, we detect significantly shortened generation times at Botai around 3500 BCE, a settlement from central Asia associated with corrals and a subsistence economy centred on horses6,7. This supports local horse husbandry before the rise of modern domestic bloodlines.
- MeSH
- Animal Husbandry * history MeSH
- History, Ancient MeSH
- Domestication * MeSH
- Transportation * history methods MeSH
- Phylogeny MeSH
- Genome genetics MeSH
- Horses * classification genetics MeSH
- Reproduction MeSH
- Animals MeSH
- Check Tag
- History, Ancient MeSH
- Male MeSH
- Female MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Historical Article MeSH
- Geographicals
- Asia MeSH
- Europe MeSH
Although Europe was not a primary centre of cattle domestication, its expansion from the Middle East and subsequent development created a complex pattern of cattle breed diversity. Many isolated populations of local historical breeds still carry the message about the physical and genetic traits of ancient populations. Since the way of life of human communities starting from the eleventh millennium BP was strongly determined by livestock husbandry, the knowledge of cattle diversity through the ages is helpful in the interpretation of many archaeological findings. Historical cattle diversity is currently at the intersection of two leading directions of genetic research. Firstly, it is archaeogenetics attempting to recover and interpret the preserved genetic information directly from archaeological finds. The advanced archaeogenetic approaches meet with the population genomics of extant cattle populations. The immense amount of genetic information collected from living cattle, due to its key economic role, allows for reconstructing the genetic profiles of the ancient populations backwards. The present paper aims to place selected archaeogenetic, genetic, and genomic findings in the picture of cattle history in Central Europe, as suggested by archaeozoological and historical records. Perspectives of the methodical connection between the genetic approaches and the approaches of traditional archaeozoology, such as osteomorphology and osteometry, are discussed. The importance, actuality, and effectiveness of combining different approaches to each archaeological find, such as morphological characterization, interpretation of the historical context, and molecular data, are stressed.
- Keywords
- Czech Red cattle, archaic DNA, aurochs, historical cattle, hornlessness, osteometry, sexual dimorphism,
- Publication type
- Journal Article MeSH
- Review MeSH
Western Eurasia witnessed several large-scale human migrations during the Holocene1-5. Here, to investigate the cross-continental effects of these migrations, we shotgun-sequenced 317 genomes-mainly from the Mesolithic and Neolithic periods-from across northern and western Eurasia. These were imputed alongside published data to obtain diploid genotypes from more than 1,600 ancient humans. Our analyses revealed a 'great divide' genomic boundary extending from the Black Sea to the Baltic. Mesolithic hunter-gatherers were highly genetically differentiated east and west of this zone, and the effect of the neolithization was equally disparate. Large-scale ancestry shifts occurred in the west as farming was introduced, including near-total replacement of hunter-gatherers in many areas, whereas no substantial ancestry shifts happened east of the zone during the same period. Similarly, relatedness decreased in the west from the Neolithic transition onwards, whereas, east of the Urals, relatedness remained high until around 4,000 BP, consistent with the persistence of localized groups of hunter-gatherers. The boundary dissolved when Yamnaya-related ancestry spread across western Eurasia around 5,000 BP, resulting in a second major turnover that reached most parts of Europe within a 1,000-year span. The genetic origin and fate of the Yamnaya have remained elusive, but we show that hunter-gatherers from the Middle Don region contributed ancestry to them. Yamnaya groups later admixed with individuals associated with the Globular Amphora culture before expanding into Europe. Similar turnovers occurred in western Siberia, where we report new genomic data from a 'Neolithic steppe' cline spanning the Siberian forest steppe to Lake Baikal. These prehistoric migrations had profound and lasting effects on the genetic diversity of Eurasian populations.
- MeSH
- History, Ancient MeSH
- Diploidy MeSH
- Genome, Human * MeSH
- Genotype MeSH
- Ice Cover MeSH
- Humans MeSH
- Hunting history MeSH
- Metagenomics * MeSH
- Human Migration * history MeSH
- Genetics, Population * MeSH
- Agriculture history MeSH
- Check Tag
- History, Ancient MeSH
- Humans MeSH
- Publication type
- Journal Article MeSH
- Geographicals
- Black Sea MeSH
- Europe ethnology MeSH
- Asia, Western MeSH
Paleogenomics has expanded our knowledge of human evolutionary history. Since the 2020s, the study of ancient DNA has increased its focus on reconstructing the recent past. However, the accuracy of paleogenomic methods in answering questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation within the historical period remains an open question. We used two simulation approaches to evaluate the limitations and behavior of commonly used methods, qpAdm and the f3-statistic, on admixture inference. The first is based on branch-length data simulated from four simple demographic models of varying complexities and configurations. The second, an analysis of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudo-haploidization. We show that under conditions resembling historical populations, qpAdm can identify a small candidate set of true sources and populations closely related to them. However, in typical ancient DNA conditions, qpAdm is unable to further distinguish between them, limiting its utility for resolving fine-scaled hypotheses. Notably, we find that complex gene-flow histories generally lead to improvements in the performance of qpAdm and observe no bias in the estimation of admixture weights. We offer a heuristic for admixture inference that incorporates admixture weight estimate and P-values of qpAdm models, and f3-statistics to enhance the power to distinguish between multiple plausible candidates. Finally, we highlight the future potential of qpAdm through whole-genome branch-length f2-statistics, demonstrating the improved demographic inference that could be achieved with advancements in f-statistic estimations.
- Keywords
- aDNA, admixture, archaeogenetics, f-statistics, paleogenomics, qpAdm,
- Publication type
- Journal Article MeSH
- Preprint MeSH
f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.
- MeSH
- African People * genetics MeSH
- Biological Variation, Population genetics MeSH
- Black People genetics MeSH
- Demography * history MeSH
- Phylogeny * MeSH
- Genotype MeSH
- Polymorphism, Single Nucleotide * genetics MeSH
- Humans MeSH
- Chromosome Mapping MeSH
- Neanderthals genetics MeSH
- Models, Statistical MeSH
- Bias MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
Our understanding of population history in deep time has been assisted by fitting admixture graphs (AGs) to data: models that specify the ordering of population splits and mixtures, which along with the amount of genetic drift and the proportions of mixture, is the only information needed to predict the patterns of allele frequency correlation among populations. The space of possible AGs relating populations is vast, and thus most published studies have identified fitting AGs through a manual process driven by prior hypotheses, leaving the majority of alternative models unexplored. Here, we develop a method for systematically searching the space of all AGs that can incorporate non-genetic information in the form of topology constraints. We implement this findGraphs tool within a software package, ADMIXTOOLS 2, which is a reimplementation of the ADMIXTOOLS software with new features and large performance gains. We apply this methodology to identify alternative models to AGs that played key roles in eight publications and find that in nearly all cases many alternative models fit nominally or significantly better than the published one. Our results suggest that strong claims about population history from AGs should only be made when all well-fitting and temporally plausible models share common topological features. Our re-evaluation of published data also provides insight into the population histories of humans, dogs, and horses, identifying features that are stable across the models we explored, as well as scenarios of populations relationships that differ in important ways from models that have been highlighted in the literature.
- Keywords
- admixture graphs, dogs, evolutionary biology, f-statistics, genetics, genomics, horses, human, humans, population genetics,
- MeSH
- Gene Frequency MeSH
- Genetic Drift MeSH
- Hominidae * MeSH
- Horses MeSH
- Humans MeSH
- Models, Genetic MeSH
- Genetics, Population * MeSH
- Dogs MeSH
- Software MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Dogs MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
f -statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. These statistics can provide strong evidence for either admixture or cladality, which can be robust to substantial rates of errors or missing data. f -statistics are guaranteed to be unbiased under "SNP ascertainment" (analyzing non-randomly chosen subsets of single nucleotide polymorphisms) only if it relies on a population that is an outgroup for all groups analyzed. However, ascertainment on a true outgroup that is not co-analyzed with other populations is often impractical and uncommon in the literature. In this study focused on practical rather than theoretical aspects of SNP ascertainment, we show that many non-outgroup ascertainment schemes lead to false rejection of true demographic histories, as well as to failure to reject incorrect models. But the bias introduced by common ascertainments such as the 1240K panel is mostly limited to situations when more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans) or non-human outgroups are co-modelled, for example, f 4 -statistics involving one non-African group, two African groups, and one archaic group. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, cannot fix all these problems since for some classes of f -statistics it is not a clean outgroup ascertainment, and in other cases it demonstrates relatively low power to reject incorrect demographic models since it provides a relatively small number of variants common in anatomically modern humans. And due to the paucity of high-coverage archaic genomes, archaic individuals used for ascertainment often act as sole representatives of the respective groups in an analysis, and we show that this approach is highly problematic. By carrying out large numbers of simulations of diverse demographic histories, we find that bias in inferences based on f -statistics introduced by non-outgroup ascertainment can be minimized if the derived allele frequency spectrum in the population used for ascertainment approaches the spectrum that existed at the root of all groups being co-analyzed. Ascertaining on sites with variants common in a diverse group of African individuals provides a good approximation to such a set of SNPs, addressing the great majority of biases and also retaining high statistical power for studying population history. Such a "pan-African" ascertainment, although not completely problem-free, allows unbiased exploration of demographic models for the widest set of archaic and modern human populations, as compared to the other ascertainment schemes we explored.
- Publication type
- Journal Article MeSH
- Preprint MeSH