Most cited article - PubMed ID 35176016
Indian genetic heritage in Southeast Asian populations
BACKGROUND: Processes shaping the formation of the present-day population structure in highly urbanized Northern Europe are still poorly understood. Gaps remain in our understanding of when and how currently observable regional differences emerged and what impact city growth, migration, and disease pandemics during and after the Middle Ages had on these processes. RESULTS: We perform low-coverage sequencing of the genomes of 338 individuals spanning the eighth to the eighteenth centuries in the city of Sint-Truiden in Flanders, in the northern part of Belgium. The early/high medieval Sint-Truiden population was more heterogeneous, having received migrants from Scotland or Ireland, and displayed less genetic relatedness than observed today between individuals in present-day Flanders. We find differences in gene variants associated with high vitamin D blood levels between individuals with Gaulish or Germanic ancestry. Although we find evidence of a Yersinia pestis infection in 5 of the 58 late medieval burials, we were unable to detect a major population-scale impact of the second plague pandemic on genetic diversity or on the elevated differentiation of immunity genes. CONCLUSIONS: This study reveals that the genetic homogenization process in a medieval city population in the Low Countries was protracted for centuries. Over time, the Sint-Truiden population became more similar to the current population of the surrounding Limburg province, likely as a result of reduced long-distance migration after the high medieval period, and the continuous process of local admixture of Germanic and Gaulish ancestries which formed the genetic cline observable today in the Low Countries.
- Keywords
- Flanders, Low countries, Medieval, Migration, Palaeo-genomics, Plague, Urbanization,
- MeSH
- History, Medieval MeSH
- Genetic Variation MeSH
- Genome, Human MeSH
- Genomics MeSH
- Humans MeSH
- Plague epidemiology history genetics MeSH
- Genetics, Population MeSH
- Urbanization * history MeSH
- Check Tag
- History, Medieval MeSH
- Humans MeSH
- Publication type
- Journal Article MeSH
- Historical Article MeSH
- Geographicals
- Belgium MeSH
qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on 2D stepping stone landscapes and in situations with low prestudy odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture graph-shaped and stepping stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: (1) prestudy odds are low and fall rapidly with increasing model complexity; (2) complex migration networks violate the assumptions of the method; hence, there is poor correlation between qpAdm P-values and model optimality, contributing to low but nonzero false-positive rate and low power; and (3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly nonoptimal models have estimates in the same interval, contributing to the false-positive rate. We also reinterpret large sets of qpAdm models from 2 studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: (1) temporal stratification of targets and proxy sources in the case of admixture graph-shaped histories, (2) focused exploration of few models for increasing prestudy odds; and (3) dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false-positive rate.
- Keywords
- qpAdm, admixture graphs, archaeogenetics, genetic admixture, simulation, stepping stone models,
- MeSH
- Humans MeSH
- Models, Genetic * MeSH
- Genetics, Population * methods MeSH
- Software * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
qpAdm is a statistical tool that is often used for testing large sets of alternative admixture models for a target population. Despite its popularity, qpAdm remains untested on two-dimensional stepping-stone landscapes and in situations with low pre-study odds (low ratio of true to false models). We tested high-throughput qpAdm protocols with typical properties such as number of source combinations per target, model complexity, model feasibility criteria, etc. Those protocols were applied to admixture-graph-shaped and stepping-stone simulated histories sampled randomly or systematically. We demonstrate that false discovery rates of high-throughput qpAdm protocols exceed 50% for many parameter combinations since: 1) pre-study odds are low and fall rapidly with increasing model complexity; 2) complex migration networks violate the assumptions of the method, hence there is poor correlation between qpAdm p-values and model optimality, contributing to low but non-zero false positive rate and low power; 3) although admixture fraction estimates between 0 and 1 are largely restricted to symmetric configurations of sources around a target, a small fraction of asymmetric highly non-optimal models have estimates in the same interval, contributing to the false positive rate. We also re-interpret large sets of qpAdm models from two studies in terms of source-target distance and symmetry and suggest improvements to qpAdm protocols: 1) temporal stratification of targets and proxy sources in the case of admixture-graph-shaped histories; 2) focused exploration of few models for increasing pre-study odds; 3) dense landscape sampling for increasing power and stringent conditions on estimated admixture fractions for decreasing the false positive rate.
- Keywords
- admixture graphs, archaeogenetics, genetic admixture, qpAdm, simulation, stepping-stone models,
- Publication type
- Journal Article MeSH
- Preprint MeSH
Our knowledge of human evolutionary history has been greatly advanced by paleogenomics. Since the 2020s, the study of ancient DNA has increasingly focused on reconstructing the recent past. However, the accuracy of paleogenomic methods in resolving questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation remains an open question. We evaluated the performance and behavior of two commonly used methods, qpAdm and the f3-statistic, on admixture inference under a diversity of demographic models and data conditions. We performed two complementary simulation approaches-firstly exploring a wide demographic parameter space under four simple demographic models of varying complexities and configurations using branch-length data from two chromosomes-and secondly, we analyzed a model of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudohaploidization. We observe that population differentiation is the primary factor driving qpAdm performance. Notably, while complex gene flow histories influence which models are classified as plausible, they do not reduce overall performance. Under conditions reflective of the historical period, qpAdm most frequently identifies the true model as plausible among a small candidate set of closely related populations. To increase the utility for resolving fine-scaled hypotheses, we provide a heuristic for further distinguishing between candidate models that incorporates qpAdm model P-values and f3-statistics. Finally, we demonstrate a significant performance increase for qpAdm using whole-genome branch-length f2-statistics, highlighting the potential for improved demographic inference that could be achieved with future advancements in f-statistic estimations.
- Keywords
- f-statistics, aDNA, admixture, ancient DNA, archaeogenetics, paleogenomics, qpAdm,
- MeSH
- Demography MeSH
- Genomics * methods MeSH
- Models, Genetic MeSH
- Paleontology * methods MeSH
- Software MeSH
- Data Accuracy MeSH
- Publication type
- Journal Article MeSH
Paleogenomics has expanded our knowledge of human evolutionary history. Since the 2020s, the study of ancient DNA has increased its focus on reconstructing the recent past. However, the accuracy of paleogenomic methods in answering questions of historical and archaeological importance amidst the increased demographic complexity and decreased genetic differentiation within the historical period remains an open question. We used two simulation approaches to evaluate the limitations and behavior of commonly used methods, qpAdm and the f3-statistic, on admixture inference. The first is based on branch-length data simulated from four simple demographic models of varying complexities and configurations. The second, an analysis of Eurasian history composed of 59 populations using whole-genome data modified with ancient DNA conditions such as SNP ascertainment, data missingness, and pseudo-haploidization. We show that under conditions resembling historical populations, qpAdm can identify a small candidate set of true sources and populations closely related to them. However, in typical ancient DNA conditions, qpAdm is unable to further distinguish between them, limiting its utility for resolving fine-scaled hypotheses. Notably, we find that complex gene-flow histories generally lead to improvements in the performance of qpAdm and observe no bias in the estimation of admixture weights. We offer a heuristic for admixture inference that incorporates admixture weight estimate and P-values of qpAdm models, and f3-statistics to enhance the power to distinguish between multiple plausible candidates. Finally, we highlight the future potential of qpAdm through whole-genome branch-length f2-statistics, demonstrating the improved demographic inference that could be achieved with advancements in f-statistic estimations.
- Keywords
- aDNA, admixture, archaeogenetics, f-statistics, paleogenomics, qpAdm,
- Publication type
- Journal Article MeSH
- Preprint MeSH
f-statistics have emerged as a first line of analysis for making inferences about demographic history from genome-wide data. Not only are they guaranteed to allow robust tests of the fits of proposed models of population history to data when analyzing full genome sequencing data-that is, all single nucleotide polymorphisms (SNPs) in the individuals being analyzed-but they are also guaranteed to allow robust tests of models for SNPs ascertained as polymorphic in a population that is an outgroup in a phylogenetic sense to all groups being analyzed. True "outgroup ascertainment" is in practice impossible in humans because our species has arisen from a substructured ancestral population that does not descend from a homogeneous ancestral population going back many hundreds of thousands of years into the past. However, initial studies suggested that non-outgroup-ascertainment schemes might produce robust enough results using f-statistics, and that motivated widespread fitting of models to data using non-outgroup-ascertained SNP panels such as the "Affymetrix Human Origins array" which has been genotyped on thousands of modern individuals from hundreds of populations, or the "1240k" in-solution enrichment reagent which has been the source of about 70% of published genome-wide data for ancient humans. In this study, we show that while analyses of population history using such panels work well for studies of relationships among non-African populations and one African outgroup, when co-modeling more than one sub-Saharan African and/or archaic human groups (Neanderthals and Denisovans), fitting of f-statistics to such SNP sets is expected to frequently lead to false rejection of true demographic histories, and failure to reject incorrect models. Analyzing panels of SNPs polymorphic in archaic humans, which has been suggested as a solution for the ascertainment problem, has limited statistical power and retains important biases. However, by carrying out simulations of diverse demographic histories, we show that bias in inferences based on f-statistics can be minimized by ascertaining on variants common in a union of diverse African groups; such ascertainment retains high statistical power while allowing co-analysis of archaic and modern groups.
- MeSH
- African People * genetics MeSH
- Biological Variation, Population genetics MeSH
- Black People genetics MeSH
- Demography * history MeSH
- Phylogeny * MeSH
- Genotype MeSH
- Polymorphism, Single Nucleotide * genetics MeSH
- Humans MeSH
- Chromosome Mapping MeSH
- Neanderthals genetics MeSH
- Models, Statistical MeSH
- Bias MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
Thailand is a country where over 60 languages from five language families (Austroasiatic, Austronesian, Hmong-Mien, Kra-Dai, and Sino-Tibetan) are spoken. The Kra-Dai language family is the most prevalent, and Thai, the official language of the country, belongs to it. Previous genome-wide studies on Thailand populations revealed a complex population structure and put some hypotheses forward concerning the population history of the country. However, many published populations have not been co-analyzed, and some aspects of population history were not explored adequately. In this study, we employ new methods to re-analyze published genome-wide genetic data on Thailand populations, with a focus on 14 Kra-Dai-speaking groups. Our analyses reveal South Asian ancestry in Kra-Dai-speaking Lao Isan and Khonmueang, and in Austroasiatic-speaking Palaung, in contrast to a previous study in which the data were generated. We support the admixture scenario for the formation of Kra-Dai-speaking groups from Thailand who harbor both Austroasiatic-related ancestry and Kra-Dai-related ancestry from outside of Thailand. We also provide evidence of bidirectional admixture between Southern Thai and Nayu, an Austronesian-speaking group from Southern Thailand. Challenging some previously reported genetic analyses, we reveal a close genetic relationship between Nayu and Austronesian-speaking groups from Island Southeast Asia (ISEA).
- MeSH
- Asian * ethnology genetics MeSH
- Asian People * ethnology genetics MeSH
- Genome-Wide Association Study MeSH
- Language * MeSH
- Humans MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Asia, Southeastern ethnology MeSH
- Thailand MeSH
Indian cultural influence is remarkable in present-day Mainland Southeast Asia (MSEA), and it may have stimulated early state formation in the region. Various present-day populations in MSEA harbor a low level of South Asian ancestry, but previous studies failed to detect such ancestry in any ancient individual from MSEA. In this study, we discovered a substantial level of South Asian admixture (ca. 40-50%) in a Protohistoric individual from the Vat Komnou cemetery at the Angkor Borei site in Cambodia. The location and direct radiocarbon dating result on the human bone (95% confidence interval is 78-234 calCE) indicate that this individual lived during the early period of Funan, one of the earliest states in MSEA, which shows that the South Asian gene flow to Cambodia started about a millennium earlier than indicated by previous published results of genetic dating relying on present-day populations. Plausible proxies for the South Asian ancestry source in this individual are present-day populations in Southern India, and the individual shares more genetic drift with present-day Cambodians than with most present-day East and Southeast Asian populations.
- MeSH
- Asian People MeSH
- South Asian People MeSH
- Humans MeSH
- Genetics, Population * MeSH
- DNA, Ancient * MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Cambodia MeSH
- Names of Substances
- DNA, Ancient * MeSH