Impact of erroneous marker data on the accuracy of narrow-sense heritability

. 2026 Jan 07 ; 232 (1) : .

Jazyk angličtina Země Spojené státy americké Médium print

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid41128645

Grantová podpora
1312 Faculty of Forestry and Wood Sciences at the Czech University of Life Sciences in Prague

Genomic relationship matrices computed from single nucleotide polymorphism (SNP) data are now widely used to estimate narrow-sense heritability (h2), yet the impact of genotyping error on these estimates is not well understood. We used stochastic simulation and supporting algebra to examine this impact and its interplay with marker density. Starting from a diploid founder population with 300 additive quantitative trait loci, we simulated SNP panels with densities ranging from 6.25 to 50 SNPs per cM and traits with true h2 of either 0.2 or 0.6. Genotypes were then altered at error rates ε=0-1 under three error kernels. For each of 100 simulation replicates, we calculated the genomic relationship matrix using VanRaden's method and estimated h2 with restricted maximum likelihood (REML). In the absence of error, low-density marker panels underestimated h2. Sparse panels were also the most tolerant up to ε≈0.1 yet still underestimated h2. Conversely, the densest panel recovered the true h2 when ε=0, but even a small error ε>0.01 caused an upward bias. The analysis reveals that all distortions are attributable to: (i) a shift in the mean off-diagonal elements of the genomic relationship matrix with magnitude (1-ε)2 and (ii) a change in the ratio between the mean diagonal and mean off-diagonal elements of the genomic relationship matrix. When ε≳0.6, every kernel pushed h2 toward zero. Thus, even modest genotyping error can inflate or deflate additive genetic variance estimates. SNP panels therefore require rigorous laboratory quality control, error-aware imputation, and statistical models that account for genotype uncertainty when estimating h2.

Zobrazit více v PubMed

Abecasis  GR, Cherny  SS, Cardon  LR. 2001. The impact of genotyping error on family-based analysis of quantitative traits. Eur J Hum Genet. 9:130–134. 10.1038/sj.ejhg.5200594. PubMed DOI

Aird  D  et al.  2011. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12:1–14. 10.1186/gb-2011-12-2-r18. PubMed DOI PMC

Akbarpour  T, Ghavi Hossein-Zadeh  N, Shadparvar  AA. 2021. Marker genotyping error effects on genomic predictions under different genetic architectures. Mol Genet Genomics. 296:79–89. 10.1007/s00438-020-01728-z. PubMed DOI

Allendorf  FW, Hohenlohe  PA, Luikart  G. 2010. Genomics and the future of conservation genetics. Nat Rev Genet. 11:697–709. 10.1038/nrg2844. PubMed DOI

Bernardo  R, Yu  J. 2007. Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 47:1082–1090. 10.2135/cropsci2006.11.0690. DOI

Bonin  A  et al.  2004. How to track and assess genotyping errors in population genetics studies. Mol Ecol. 13:3261–3273. 10.1111/mec.2004.13.issue-11. PubMed DOI

Brandt  DY  et al.  2015. Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 genomes project phase I data. G3–Genes Genom Genet. 5:931–941. 10.1534/g3.114.015784. PubMed DOI PMC

Browning  BL, Browning  SR. 2016. Genotype imputation with millions of reference samples. Am J Hum Genet. 98:116–126. 10.1016/j.ajhg.2015.11.020. PubMed DOI PMC

Butler  D, Cullis  B, Gilmour  A, Gogel  B, Thompson  R. 2023. ASReml-R Reference Manual Version 4.2. VSN International Ltd. Hemel Hempstead, HP2 4TP, UK.

Bycroft  C  et al.  2018. The UK Biobank resource with deep phenotyping and genomic data. Nature. 562:203–209. 10.1038/s41586-018-0579-z. PubMed DOI PMC

Cheung  CY, Thompson  EA, Wijsman  EM. 2014. Detection of Mendelian consistent genotyping errors in pedigrees. Genet Epidemiol. 38:291–299. 10.1002/gepi.2014.38.issue-4. PubMed DOI PMC

Clevenger  J, Chavarro  C, Pearl  SA, Ozias-Akins  P, Jackson  SA. 2015. Single nucleotide polymorphism identification in polyploids: a review, example, and recommendations. Mol Plant. 8:831–846. 10.1016/j.molp.2015.02.002. PubMed DOI

Derks  M, Pook  T, Chen  J, Hawken  R, Bouwman  A. 2025. Utilizing low-pass sequence data to study the impact of structural variants on polygenic traits. Preprint. Research Square.

Dufflocq  P, Pérez-Enciso  M, Lhorente  JP, Yáñez  JM. 2019. Accuracy of genomic predictions using different imputation error rates in aquaculture breeding programs: a simulation study. Aquaculture. 503:225–230. 10.1016/j.aquaculture.2018.12.061. DOI

Endelman  JB. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome–US. 4:250–255. 10.3835/plantgenome2011.08.0024. DOI

Forneris  NS  et al.  2015. Quality control of genotypes using heritability estimates of gene content at the marker. Genetics. 199:675–681. 10.1534/genetics.114.173559. PubMed DOI PMC

Gamal El-Dien  O  et al.  2016. Implementation of the realized genomic relationship matrix to open-pollinated white spruce family testing for disentangling additive from nonadditive genetic effects. G3–Genes Genom Genet. 6:743–753. 10.1534/g3.115.025957. PubMed DOI PMC

Gezan  SA, de Oliveira  AA, Galli  G, Murray  D. 2022. User’s manual for ASRgenomics v. 1.1. 0: an R package with complementary genomic functions. VSN International Ltd, Hemel Hempstead, HP1 1ES, UK.

Gienapp  P  et al.  2017. Genomic quantitative genetics to study evolution in the wild. Trends Ecol Evol. 32:897–908. 10.1016/j.tree.2017.09.004. PubMed DOI

Goddard  ME. 2009. View to the future: could genomic evaluation become the standard?. Interbull Bull. 39:83–88.

Goddard  ME, Hayes  BJ, Meuwissen  TH. 2011. Using the genomic relationship matrix to predict the accuracy of genomic selection. J Anim Breed Genet. 128:409–421. 10.1111/jbg.2011.128.issue-6. PubMed DOI

Goldstein  DR, Zhao  H, Speed  TP. 1997. The effects of genotyping errors and interference on estimation of genetic distance. Hum Hered. 47:86–100. 10.1159/000154396. PubMed DOI

Gordon  D, Heath  S, Ott  J. 1999. True pedigree errors more frequent than apparent errors for single nucleotide polymorphisms. Hum Hered. 49:65–70. 10.1159/000022846. PubMed DOI

Guo  Y  et al.  2014. Illumina human exome genotyping array clustering and quality control. Nat Protoc. 9:2643–2662. 10.1038/nprot.2014.174. PubMed DOI PMC

Henderson  CR. 1953. Estimation of variance and covariance components. Biometrics. 9:226–252. 10.2307/3001853. DOI

Hill  WG, Weir  BS. 2011. Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genet Res. 93:47–64. 10.1017/S0016672310000480. PubMed DOI PMC

Hinrichs  AL, Suarez  BK. 2005. Genotyping errors, pedigree errors, and missing data. Genet Epidemiol. 29:S120–S124. 10.1002/(ISSN)1098-2272. PubMed DOI

Hosking  L  et al.  2004. Detection of genotyping errors by Hardy–Weinberg equilibrium testing. Eur J Hum Genet. 12:395–399. 10.1038/sj.ejhg.5201164. PubMed DOI

Kang  HM  et al.  2008. Efficient control of population structure in model organism association mapping. Genetics. 178:1709–1723. 10.1534/genetics.107.080101. PubMed DOI PMC

Khalilisamani  N, Thomson  P, Raadsma  H, Khatkar  M. 2021. Impact of genotypic errors with equal and unequal family contribution on accuracy of genomic prediction in aquaculture using simulation. Sci Rep–UK. 11:18318. 10.1038/s41598-021-97873-5. PubMed DOI PMC

Laehnemann  D, Borkhardt  A, McHardy  AC. 2016. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform. 17:154–179. 10.1093/bib/bbv029. PubMed DOI PMC

Lee  SH, Goddard  ME, Visscher  PM, Van Der Werf  JH. 2010. Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits. Genet Sel Evol. 42:1–14. 10.1186/1297-9686-42-22. PubMed DOI PMC

Lee  SH, Wray  NR, Goddard  ME, Visscher  PM. 2011. Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet. 88:294–305. 10.1016/j.ajhg.2011.02.002. PubMed DOI PMC

Legarra  A, Aguilar  I, Misztal  I. 2009. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 92:4656–4663. 10.3168/jds.2009-2061. PubMed DOI

Li  H. 2014. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics. 30:2843–2851. 10.1093/bioinformatics/btu356. PubMed DOI PMC

Lynch  M, Walsh  B. 1998. Genetics and analysis of quantitative traits. Vol. 1. Sinauer Sunderland.

Meuwissen  THE, Hayes  BJ, Goddard  ME. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 157:1819–1829. 10.1093/genetics/157.4.1819. PubMed DOI PMC

Misztal  I, Lourenco  D, Legarra  A. 2020. Current status of genomic evaluation. J Anim Sci. 98:skaa101. 10.1093/jas/skaa101. PubMed DOI PMC

Munoz  PR  et al.  2014. Genomic relationship matrix for correcting pedigree errors in breeding populations: impact on genetic parameters and genomic selection accuracy. Crop Sci. 54:1115–1123. 10.2135/cropsci2012.12.0673. DOI

Paten  B, Novak  AM, Eizenga  JM, Garrison  E. 2017. Genome graphs and the evolution of genome inference. Genome Res. 27:665–676. 10.1101/gr.214155.116. PubMed DOI PMC

Pompanon  F, Bonin  A, Bellemain  E, Taberlet  P. 2005. Genotyping errors: causes, consequences and solutions. Nat Rev Genet. 6:847–859. 10.1038/nrg1707. PubMed DOI

Pook  T  et al.  2019. Haploblocker: creation of subgroup-specific haplotype blocks and libraries. Genetics. 212:1045–1061. 10.1534/genetics.119.302283. PubMed DOI PMC

Pook  T, Schlather  M, Simianer  H. 2020. MoBPS-modular breeding program simulator. G3–Genes Genom Genet. 10:1915–1918. 10.1534/g3.120.401193. PubMed DOI PMC

Poplin  R  et al.  2018. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol. 36:983–987. 10.1038/nbt.4235. PubMed DOI

Rakocevic  G  et al.  2019. Fast and accurate genomic analyses using genome graphs. Nat Genet. 51:354–362. 10.1038/s41588-018-0316-4. PubMed DOI

Rang  FJ, Kloosterman  WP, de Ridder  J. 2018. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy. Genome Biol. 19:90. 10.1186/s13059-018-1462-9. PubMed DOI PMC

R Core Team . 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria.

Schlather  M. 2020. Efficient calculation of the genomic relationship matrix. bioRxiv. pp. 1–7. 10.1101/2020.01.12.903146, preprint: not peer reviewed. DOI

Seaman  S, Holmans  P. 2005. Effect of genotyping error on type-i error rate of affected sib pair studies with genotyped parents. Hum Hered. 59:157–164. 10.1159/000085939. PubMed DOI

Shafin  K  et al.  2021. Haplotype-aware variant calling with PEPPER–Margin–DeepVariant enables high accuracy in nanopore long-reads. Nat Methods. 18:1322–1332. 10.1038/s41592-021-01299-w. PubMed DOI PMC

Sobel  E, Papp  JC, Lange  K. 2002. Detection and integration of genotyping errors in statistical genetics. Am J Hum Genet. 70:496–508. 10.1086/338920. PubMed DOI PMC

VanRaden  PM. 2008. Efficient methods to compute genomic predictions. J Dairy Sci. 91:4414–4423. 10.3168/jds.2007-0980. PubMed DOI

Varma  A, Padh  H, Shrivastava  N. 2007. Plant genomic DNA isolation: an art or a science. Biotechnol J. 2:386–392. 10.1002/biot.v2:3. PubMed DOI

Wang  J. 2018. Estimating genotyping errors from genotype and reconstructed pedigree data. Methods Ecol Evol. 9:109–120. 10.1111/mee3.2018.9.issue-1. DOI

Wang  X  et al.  2019. Improving genomic predictions by correction of genotypes from genotyping by sequencing in livestock populations. J Anim Sci Biotechnol. 10:8. 10.1186/s40104-019-0315-z. PubMed DOI PMC

Wenger  AM  et al.  2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 37:1155–1162. 10.1038/s41587-019-0217-9. PubMed DOI PMC

Wiggans  G  et al.  2009. Selection of single-nucleotide polymorphisms and quality of genotypes used in genomic evaluation of dairy cattle in the United States and Canada. J Dairy Sci. 92:3431–3436. 10.3168/jds.2008-1758. PubMed DOI

Wiggans  G, VanRaden  P, Cooper  T. 2011. The genomic evaluation system in the United States: Past, present, future. J Dairy Sci. 94:3202–3211. 10.3168/jds.2010-3866. PubMed DOI

Yang  J, Lee  SH, Goddard  ME, Visscher  PM. 2011. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 88:76–82. 10.1016/j.ajhg.2010.11.011. PubMed DOI PMC

Zapata-Valenzuela  J, Whetten  RW, Neale  D, McKeand  S, Isik  F. 2013. Genomic estimated breeding values using genomic relationship matrices in a cloned population of loblolly pine. G3–Genes Genom Genet. 3:909–916. 10.1534/g3.113.005975. PubMed DOI PMC

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...