Impact of erroneous marker data on the accuracy of narrow-sense heritability
Jazyk angličtina Země Spojené státy americké Médium print
Typ dokumentu časopisecké články
Grantová podpora
1312
Faculty of Forestry and Wood Sciences at the Czech University of Life Sciences in Prague
PubMed
41128645
PubMed Central
PMC12774826
DOI
10.1093/genetics/iyaf230
PII: 8300213
Knihovny.cz E-zdroje
- Klíčová slova
- allelic-based simulation, genetic evaluation, genomic relationship matrix, genotyping error, single nucleotide polymorphism,
- MeSH
- genetické markery MeSH
- genotyp MeSH
- jednonukleotidový polymorfismus * MeSH
- kvantitativní znak dědičný * MeSH
- lidé MeSH
- lokus kvantitativního znaku MeSH
- modely genetické * MeSH
- počítačová simulace MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- genetické markery MeSH
Genomic relationship matrices computed from single nucleotide polymorphism (SNP) data are now widely used to estimate narrow-sense heritability (h2), yet the impact of genotyping error on these estimates is not well understood. We used stochastic simulation and supporting algebra to examine this impact and its interplay with marker density. Starting from a diploid founder population with 300 additive quantitative trait loci, we simulated SNP panels with densities ranging from 6.25 to 50 SNPs per cM and traits with true h2 of either 0.2 or 0.6. Genotypes were then altered at error rates ε=0-1 under three error kernels. For each of 100 simulation replicates, we calculated the genomic relationship matrix using VanRaden's method and estimated h2 with restricted maximum likelihood (REML). In the absence of error, low-density marker panels underestimated h2. Sparse panels were also the most tolerant up to ε≈0.1 yet still underestimated h2. Conversely, the densest panel recovered the true h2 when ε=0, but even a small error ε>0.01 caused an upward bias. The analysis reveals that all distortions are attributable to: (i) a shift in the mean off-diagonal elements of the genomic relationship matrix with magnitude (1-ε)2 and (ii) a change in the ratio between the mean diagonal and mean off-diagonal elements of the genomic relationship matrix. When ε≳0.6, every kernel pushed h2 toward zero. Thus, even modest genotyping error can inflate or deflate additive genetic variance estimates. SNP panels therefore require rigorous laboratory quality control, error-aware imputation, and statistical models that account for genotype uncertainty when estimating h2.
Zobrazit více v PubMed
Abecasis GR, Cherny SS, Cardon LR. 2001. The impact of genotyping error on family-based analysis of quantitative traits. Eur J Hum Genet. 9:130–134. 10.1038/sj.ejhg.5200594. PubMed DOI
Aird D et al. 2011. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12:1–14. 10.1186/gb-2011-12-2-r18. PubMed DOI PMC
Akbarpour T, Ghavi Hossein-Zadeh N, Shadparvar AA. 2021. Marker genotyping error effects on genomic predictions under different genetic architectures. Mol Genet Genomics. 296:79–89. 10.1007/s00438-020-01728-z. PubMed DOI
Allendorf FW, Hohenlohe PA, Luikart G. 2010. Genomics and the future of conservation genetics. Nat Rev Genet. 11:697–709. 10.1038/nrg2844. PubMed DOI
Bernardo R, Yu J. 2007. Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 47:1082–1090. 10.2135/cropsci2006.11.0690. DOI
Bonin A et al. 2004. How to track and assess genotyping errors in population genetics studies. Mol Ecol. 13:3261–3273. 10.1111/mec.2004.13.issue-11. PubMed DOI
Brandt DY et al. 2015. Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 genomes project phase I data. G3–Genes Genom Genet. 5:931–941. 10.1534/g3.114.015784. PubMed DOI PMC
Browning BL, Browning SR. 2016. Genotype imputation with millions of reference samples. Am J Hum Genet. 98:116–126. 10.1016/j.ajhg.2015.11.020. PubMed DOI PMC
Butler D, Cullis B, Gilmour A, Gogel B, Thompson R. 2023. ASReml-R Reference Manual Version 4.2. VSN International Ltd. Hemel Hempstead, HP2 4TP, UK.
Bycroft C et al. 2018. The UK Biobank resource with deep phenotyping and genomic data. Nature. 562:203–209. 10.1038/s41586-018-0579-z. PubMed DOI PMC
Cheung CY, Thompson EA, Wijsman EM. 2014. Detection of Mendelian consistent genotyping errors in pedigrees. Genet Epidemiol. 38:291–299. 10.1002/gepi.2014.38.issue-4. PubMed DOI PMC
Clevenger J, Chavarro C, Pearl SA, Ozias-Akins P, Jackson SA. 2015. Single nucleotide polymorphism identification in polyploids: a review, example, and recommendations. Mol Plant. 8:831–846. 10.1016/j.molp.2015.02.002. PubMed DOI
Derks M, Pook T, Chen J, Hawken R, Bouwman A. 2025. Utilizing low-pass sequence data to study the impact of structural variants on polygenic traits. Preprint. Research Square.
Dufflocq P, Pérez-Enciso M, Lhorente JP, Yáñez JM. 2019. Accuracy of genomic predictions using different imputation error rates in aquaculture breeding programs: a simulation study. Aquaculture. 503:225–230. 10.1016/j.aquaculture.2018.12.061. DOI
Endelman JB. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome–US. 4:250–255. 10.3835/plantgenome2011.08.0024. DOI
Forneris NS et al. 2015. Quality control of genotypes using heritability estimates of gene content at the marker. Genetics. 199:675–681. 10.1534/genetics.114.173559. PubMed DOI PMC
Gamal El-Dien O et al. 2016. Implementation of the realized genomic relationship matrix to open-pollinated white spruce family testing for disentangling additive from nonadditive genetic effects. G3–Genes Genom Genet. 6:743–753. 10.1534/g3.115.025957. PubMed DOI PMC
Gezan SA, de Oliveira AA, Galli G, Murray D. 2022. User’s manual for ASRgenomics v. 1.1. 0: an R package with complementary genomic functions. VSN International Ltd, Hemel Hempstead, HP1 1ES, UK.
Gienapp P et al. 2017. Genomic quantitative genetics to study evolution in the wild. Trends Ecol Evol. 32:897–908. 10.1016/j.tree.2017.09.004. PubMed DOI
Goddard ME. 2009. View to the future: could genomic evaluation become the standard?. Interbull Bull. 39:83–88.
Goddard ME, Hayes BJ, Meuwissen TH. 2011. Using the genomic relationship matrix to predict the accuracy of genomic selection. J Anim Breed Genet. 128:409–421. 10.1111/jbg.2011.128.issue-6. PubMed DOI
Goldstein DR, Zhao H, Speed TP. 1997. The effects of genotyping errors and interference on estimation of genetic distance. Hum Hered. 47:86–100. 10.1159/000154396. PubMed DOI
Gordon D, Heath S, Ott J. 1999. True pedigree errors more frequent than apparent errors for single nucleotide polymorphisms. Hum Hered. 49:65–70. 10.1159/000022846. PubMed DOI
Guo Y et al. 2014. Illumina human exome genotyping array clustering and quality control. Nat Protoc. 9:2643–2662. 10.1038/nprot.2014.174. PubMed DOI PMC
Henderson CR. 1953. Estimation of variance and covariance components. Biometrics. 9:226–252. 10.2307/3001853. DOI
Hill WG, Weir BS. 2011. Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genet Res. 93:47–64. 10.1017/S0016672310000480. PubMed DOI PMC
Hinrichs AL, Suarez BK. 2005. Genotyping errors, pedigree errors, and missing data. Genet Epidemiol. 29:S120–S124. 10.1002/(ISSN)1098-2272. PubMed DOI
Hosking L et al. 2004. Detection of genotyping errors by Hardy–Weinberg equilibrium testing. Eur J Hum Genet. 12:395–399. 10.1038/sj.ejhg.5201164. PubMed DOI
Kang HM et al. 2008. Efficient control of population structure in model organism association mapping. Genetics. 178:1709–1723. 10.1534/genetics.107.080101. PubMed DOI PMC
Khalilisamani N, Thomson P, Raadsma H, Khatkar M. 2021. Impact of genotypic errors with equal and unequal family contribution on accuracy of genomic prediction in aquaculture using simulation. Sci Rep–UK. 11:18318. 10.1038/s41598-021-97873-5. PubMed DOI PMC
Laehnemann D, Borkhardt A, McHardy AC. 2016. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform. 17:154–179. 10.1093/bib/bbv029. PubMed DOI PMC
Lee SH, Goddard ME, Visscher PM, Van Der Werf JH. 2010. Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits. Genet Sel Evol. 42:1–14. 10.1186/1297-9686-42-22. PubMed DOI PMC
Lee SH, Wray NR, Goddard ME, Visscher PM. 2011. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 88:294–305. 10.1016/j.ajhg.2011.02.002. PubMed DOI PMC
Legarra A, Aguilar I, Misztal I. 2009. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 92:4656–4663. 10.3168/jds.2009-2061. PubMed DOI
Li H. 2014. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics. 30:2843–2851. 10.1093/bioinformatics/btu356. PubMed DOI PMC
Lynch M, Walsh B. 1998. Genetics and analysis of quantitative traits. Vol. 1. Sinauer Sunderland.
Meuwissen THE, Hayes BJ, Goddard ME. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 157:1819–1829. 10.1093/genetics/157.4.1819. PubMed DOI PMC
Misztal I, Lourenco D, Legarra A. 2020. Current status of genomic evaluation. J Anim Sci. 98:skaa101. 10.1093/jas/skaa101. PubMed DOI PMC
Munoz PR et al. 2014. Genomic relationship matrix for correcting pedigree errors in breeding populations: impact on genetic parameters and genomic selection accuracy. Crop Sci. 54:1115–1123. 10.2135/cropsci2012.12.0673. DOI
Paten B, Novak AM, Eizenga JM, Garrison E. 2017. Genome graphs and the evolution of genome inference. Genome Res. 27:665–676. 10.1101/gr.214155.116. PubMed DOI PMC
Pompanon F, Bonin A, Bellemain E, Taberlet P. 2005. Genotyping errors: causes, consequences and solutions. Nat Rev Genet. 6:847–859. 10.1038/nrg1707. PubMed DOI
Pook T et al. 2019. Haploblocker: creation of subgroup-specific haplotype blocks and libraries. Genetics. 212:1045–1061. 10.1534/genetics.119.302283. PubMed DOI PMC
Pook T, Schlather M, Simianer H. 2020. MoBPS-modular breeding program simulator. G3–Genes Genom Genet. 10:1915–1918. 10.1534/g3.120.401193. PubMed DOI PMC
Poplin R et al. 2018. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 36:983–987. 10.1038/nbt.4235. PubMed DOI
Rakocevic G et al. 2019. Fast and accurate genomic analyses using genome graphs. Nat Genet. 51:354–362. 10.1038/s41588-018-0316-4. PubMed DOI
Rang FJ, Kloosterman WP, de Ridder J. 2018. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 19:90. 10.1186/s13059-018-1462-9. PubMed DOI PMC
R Core Team . 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria.
Schlather M. 2020. Efficient calculation of the genomic relationship matrix. bioRxiv. pp. 1–7. 10.1101/2020.01.12.903146, preprint: not peer reviewed. DOI
Seaman S, Holmans P. 2005. Effect of genotyping error on type-i error rate of affected sib pair studies with genotyped parents. Hum Hered. 59:157–164. 10.1159/000085939. PubMed DOI
Shafin K et al. 2021. Haplotype-aware variant calling with PEPPER–Margin–DeepVariant enables high accuracy in nanopore long-reads. Nat Methods. 18:1322–1332. 10.1038/s41592-021-01299-w. PubMed DOI PMC
Sobel E, Papp JC, Lange K. 2002. Detection and integration of genotyping errors in statistical genetics. Am J Hum Genet. 70:496–508. 10.1086/338920. PubMed DOI PMC
VanRaden PM. 2008. Efficient methods to compute genomic predictions. J Dairy Sci. 91:4414–4423. 10.3168/jds.2007-0980. PubMed DOI
Varma A, Padh H, Shrivastava N. 2007. Plant genomic DNA isolation: an art or a science. Biotechnol J. 2:386–392. 10.1002/biot.v2:3. PubMed DOI
Wang J. 2018. Estimating genotyping errors from genotype and reconstructed pedigree data. Methods Ecol Evol. 9:109–120. 10.1111/mee3.2018.9.issue-1. DOI
Wang X et al. 2019. Improving genomic predictions by correction of genotypes from genotyping by sequencing in livestock populations. J Anim Sci Biotechnol. 10:8. 10.1186/s40104-019-0315-z. PubMed DOI PMC
Wenger AM et al. 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 37:1155–1162. 10.1038/s41587-019-0217-9. PubMed DOI PMC
Wiggans G et al. 2009. Selection of single-nucleotide polymorphisms and quality of genotypes used in genomic evaluation of dairy cattle in the United States and Canada. J Dairy Sci. 92:3431–3436. 10.3168/jds.2008-1758. PubMed DOI
Wiggans G, VanRaden P, Cooper T. 2011. The genomic evaluation system in the United States: Past, present, future. J Dairy Sci. 94:3202–3211. 10.3168/jds.2010-3866. PubMed DOI
Yang J, Lee SH, Goddard ME, Visscher PM. 2011. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 88:76–82. 10.1016/j.ajhg.2010.11.011. PubMed DOI PMC
Zapata-Valenzuela J, Whetten RW, Neale D, McKeand S, Isik F. 2013. Genomic estimated breeding values using genomic relationship matrices in a cloned population of loblolly pine. G3–Genes Genom Genet. 3:909–916. 10.1534/g3.113.005975. PubMed DOI PMC