STRUCTURE is more robust than other clustering methods in simulated mixed-ploidy populations

. 2019 Oct ; 123 (4) : 429-441. [epub] 20190708

Jazyk angličtina Země Anglie, Velká Británie Médium print-electronic

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid31285566
Odkazy

PubMed 31285566
PubMed Central PMC6781132
DOI 10.1038/s41437-019-0247-6
PII: 10.1038/s41437-019-0247-6
Knihovny.cz E-zdroje

Analysis of population genetic structure has become a standard approach in population genetics. In polyploid complexes, clustering analyses can elucidate the origin of polyploid populations and patterns of admixture between different cytotypes. However, combining diploid and polyploid data can theoretically lead to biased inference with (artefactual) clustering by ploidy. We used simulated mixed-ploidy (diploid-autotetraploid) data to systematically compare the performance of k-means clustering and the model-based clustering methods implemented in STRUCTURE, ADMIXTURE, FASTSTRUCTURE and INSTRUCT under different scenarios of differentiation and with different marker types. Under scenarios of strong population differentiation, the tested applications performed equally well. However, when population differentiation was weak, STRUCTURE was the only method that allowed unbiased inference with markers with limited genotypic information (co-dominant markers with unknown dosage or dominant markers). Still, since STRUCTURE was comparatively slow, the much faster but less powerful FASTSTRUCTURE provides a reasonable alternative for large datasets. Finally, although bias makes k-means clustering unsuitable for markers with incomplete genotype information, for large numbers of loci (>1000) with known dosage k-means clustering was superior to FASTSTRUCTURE in terms of power and speed. We conclude that STRUCTURE is the most robust method for the analysis of genetic structure in mixed-ploidy populations, although alternative methods should be considered under some specific conditions.

Zobrazit více v PubMed

Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. PubMed DOI PMC

Barringer BC. Polyploidy and self-fertilization in flowering plants. Am J Bot. 2007;94:1527–1533. doi: 10.3732/ajb.94.9.1527. PubMed DOI

Burnier J, Buerki S, Arrigo N, Kupfer P, Alvarez N. Genetic structure and evolution of Alpine polyploid complexes: Ranunculus kuepferi (Ranunculaceae) as a case study. Mol Ecol. 2009;18:3730–3744. doi: 10.1111/j.1365-294X.2009.04281.x. PubMed DOI

Dufresne F, Stift M, Vergilino R, Mable BK. Recent progress and challenges in population genetics of polyploid organisms: an overview of current state-of-the-art molecular and statistical tools. Mol Ecol. 2014;23:40–69. doi: 10.1111/mec.12581. PubMed DOI

Durka W, Michalski SG, Berendzen KW, Bossdorf O, Bucharova A, Hermann JM, et al. Genetic differentiation within multiple common grassland plants supports seed transfer zones for ecological restoration. J Appl Ecol. 2017;54:116–126. doi: 10.1111/1365-2664.12636. DOI

Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005;14:2611–2620. doi: 10.1111/j.1365-294X.2005.02553.x. PubMed DOI

Gao H, Williamson S, Bustamante CD. A Markov chain Monte Carlo approach for joint inference of population structure and inbreeding rates from multilocus genotype data. Genetics. 2007;176:1635–1651. doi: 10.1534/genetics.107.072371. PubMed DOI PMC

Jombart T. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24:1403–1405. doi: 10.1093/bioinformatics/btn129. PubMed DOI

Jombart T, Ahmed I. adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics. 2011;27:3070–3071. doi: 10.1093/bioinformatics/btr521. PubMed DOI PMC

Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. Bmc Genet. 2010;11:94. doi: 10.1186/1471-2156-11-94. PubMed DOI PMC

Kalinowski ST. The computer program STRUCTURE does not reliably identify the main genetic clusters within species: simulations and implications for human population structure. Heredity. 2011;106:625–632. doi: 10.1038/hdy.2010.95. PubMed DOI PMC

Kolar F, Certner M, Suda J, Schonswetter P, Husband BC. Mixed-ploidy species: progress and opportunities in polyploid research. Trends Plant Sci. 2017;22:1041–1055. doi: 10.1016/j.tplants.2017.09.011. PubMed DOI

Kolar F, Fer T, Stech M, Travnicek P, Duskova E, Schönswetter P, et al. Bringing together evolution on serpentine and polyploidy: spatiotemporal history of the diploid-tetraploid complex of Knautia arvensis (Dipsacaceae) PLoS ONE. 2012;7:e39988. doi: 10.1371/journal.pone.0039988. PubMed DOI PMC

Kuzmanović N, Comanescu P, Frajman B, Lazarević M, Paun O, Schönswetter P, et al. Genetic, cytological and morphological differentiation within the Balkan-Carpathian Sesleria rigida sensu Fl. Eur. (Poaceae): a taxonomically intricate tetraploid-octoploid complex. Taxon. 2013;62:458–472. doi: 10.12705/623.13. DOI

Lepsi M, Vit P, Lepsi P, Boublik K, Kolar F. Sorbus portae-bohemicae and Sorbus albensis, two new endemic apomictic species recognized based on a revision of Sorbus bohemica. Preslia. 2009;81:63–89.

Mandak B, Vit P, Krak K, Travnicek P, Havrdova A, Hadincova V, et al. Flow cytometry, microsatellites and niche models reveal the origins and geographical structure of Alnus glutinosa populations in Europe. Ann Bot-Lond. 2016;117:107–120. doi: 10.1093/aob/mcv158. PubMed DOI PMC

Meirmans PG. AMOVA-based clustering of population genetic data. J Hered. 2012;103:744–750. doi: 10.1093/jhered/ess047. PubMed DOI

Meirmans PG. Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset. Heredity. 2019;122:276–287. doi: 10.1038/s41437-018-0124-8. PubMed DOI PMC

Meirmans PG, Liu SL, van Tienderen PH. The analysis of polyploid genetic data. J Hered. 2018;109:283–296. doi: 10.1093/jhered/esy006. PubMed DOI

Meudt HM, Clarke AC. Almost forgotten or latest practice? AFLP applications, analyses and advances. Trends Plant Sci. 2007;12:106–117. doi: 10.1016/j.tplants.2007.02.001. PubMed DOI

Monnahan P, Kolar F, Baduel P, Sailer C, Koch J, Horvath R, et al. Pervasive population genomic consequences of genome duplication in Arabidopsis arenosa. Nat Ecol Evol. 2019;3:457–468. doi: 10.1038/s41559-019-0807-4. PubMed DOI

Moody ME, Mueller LD, Soltis DE. Genetic variation and random drift in autotetraploid populations. Genetics. 1993;134:649–657. PubMed PMC

Novikova Polina Yu, Hohmann Nora, Nizhynska Viktoria, Tsuchimatsu Takashi, Ali Jamshaid, Muir Graham, Guggisberg Alessia, Paape Tim, Schmid Karl, Fedorenko Olga M, Holm Svante, Säll Torbjörn, Schlötterer Christian, Marhold Karol, Widmer Alex, Sese Jun, Shimizu Kentaro K, Weigel Detlef, Krämer Ute, Koch Marcus A, Nordborg Magnus. Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specific polymorphism. Nature Genetics. 2016;48(9):1077–1082. doi: 10.1038/ng.3617. PubMed DOI

Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. PubMed PMC

Puechmaille SJ. The program STRUCTURE does not reliably recover the correct population structure when sampling is uneven: subsampling and new estimators alleviate the problem. Mol Ecol Resour. 2016;16:608–627. doi: 10.1111/1755-0998.12512. PubMed DOI

R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Raj A, Stephens M, Pritchard JK. fastSTRUCTURE: Variational inference of population structure in large SNP data sets. Genetics. 2014;197:573–U207. doi: 10.1534/genetics.114.164350. PubMed DOI PMC

Rodzen JA, Famula TR, May B. Estimation of parentage and relatedness in the polyploid white sturgeon (Acipenser transmontanus) using a dominant marker approach for duplicated microsatellite loci. Aquaculture. 2004;232:165–182. doi: 10.1016/S0044-8486(03)00450-2. DOI

Španiel S, Marhold K, Filová B, Zozomová-Lihová J. Genetic and morphological variation in the diploid–polyploid Alyssum montanum in Central Europe: taxonomic and evolutionary considerations. Plant Syst Evol. 2011;294:1–25. doi: 10.1007/s00606-011-0438-y. DOI

Tomasello S, Oberprieler C. Frozen ploidies: a phylogeographical analysis of the Leucanthemopsis alpina polyploid complex (Asteraceae, Anthemideae) Bot J Linn Soc. 2017;183:211–235. doi: 10.1093/botlinnean/bow009. DOI

Trucchi E, Frajman B, Haverkamp THA, Schonswetter P, Paun O. Genomic analyses suggest parallel ecological divergence in Heliosperma pusillum (Caryophyllaceae) New Phytol. 2017;216:267–278. doi: 10.1111/nph.14722. PubMed DOI PMC

Vallejo-Marin M, Lye GC. Hybridisation and genetic diversity in introduced Mimulus (Phrymaceae) Heredity. 2013;110:111–122. doi: 10.1038/hdy.2012.91. PubMed DOI PMC

Wang JL, Scribner KT. Parentage and sibship inference from markers in polyploids. Mol Ecol Resour. 2014;14:541–553. doi: 10.1111/1755-0998.12210. PubMed DOI

Zozomová-Lihová J, Malanova-Krasna I, Vit P, Urfus T, Senko D, Svitok M, et al. Cytotype distribution patterns, ecological differentiation, and genetic structure in a diploid-tetraploid contact zone of Cardamine amara. Am J Bot. 2015;102:1380–1395. doi: 10.3732/ajb.1500052. PubMed DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace