Accurate sequencing of DNA motifs able to form alternative (non-B) structures

. 2023 Jun ; 33 (6) : 907-922. [epub] 20230711

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural

Perzistentní odkaz   https://www.medvik.cz/link/pmid37433640

Grantová podpora
R01 GM136684 NIGMS NIH HHS - United States

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.

Zobrazit více v PubMed

The 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526: 68–74. 10.1038/nature15393 PubMed DOI PMC

Aird D, Ross MG, Chen W-S, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A. 2011. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 12: R18. 10.1186/gb-2011-12-2-r18 PubMed DOI PMC

Aitchison J. 1982. The statistical analysis of compositional data. J R Statist Soc Ser B 44: 139–160. 10.1111/j.2517-6161.1982.tb01195.x DOI

Barbič A, Zimmer DP, Crothers DM. 2003. Structural origins of adenine-tract bending. Proc Natl Acad Sci 100: 2369–2373. 10.1073/pnas.0437877100 PubMed DOI PMC

Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc Ser B 57: 289–300. 10.1111/j.2517-6161.1995.tb02031.x DOI

Biffi G, Tannahill D, McCafferty J, Balasubramanian S. 2013. Quantitative visualization of DNA G-quadruplex structures in human cells. Nat Chem 5: 182–186. 10.1038/nchem.1548 PubMed DOI PMC

Bowden R, Davies RW, Heger A, Pagnamenta AT, de Cesare M, Oikkonen LE, Parkes D, Freeman C, Dhalla F, Patel SY, et al. 2019. Sequencing of human genomes with nanopore technology. Nat Commun 10: 1869. 10.1038/s41467-019-09637-5 PubMed DOI PMC

Burge S, Parkinson GN, Hazel P, Todd AK, Neidle S. 2006. Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res 34: 5402–5415. 10.1093/nar/gkl655 PubMed DOI PMC

Cer RZ, Donohue DE, Mudunuri US, Temiz NA, Loss MA, Starner NJ, Halusa GN, Volfovsky N, Yi M, Luke BT, et al. 2013. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res 41: D94–D100. 10.1093/nar/gks955 PubMed DOI PMC

Cook RD, Sanford W. 1982. Residuals and influence in regression. Chapman and Hall, New York.

Daniel B, Deamer DW. 2019. Nanopore sequencing: an introduction. World Scientific, Singapore.

Dohm JC, Lottaz C, Borodina T, Himmelbauer H. 2008. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 36: e105. 10.1093/nar/gkn425 PubMed DOI PMC

Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323: 133–138. 10.1126/science.1162986 PubMed DOI

Fungtammasan A, Ananda G, Hile SE, Su MS-W, Sun C, Harris R, Medvedev P, Eckert K, Makova KD. 2015. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res 25: 736–749. 10.1101/gr.185892.114 PubMed DOI PMC

Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907 [q-bio.GN].

Ghosh A, Bansal M. 2003. A glossary of DNA structures from A to Z. Acta Crystallogr D Biol Crystallogr 59: 620–626. 10.1107/s0907444903003251 PubMed DOI

Guiblet WM, Cremona MA, Cechova M, Harris RS, Kejnovská I, Kejnovsky E, Eckert K, Chiaromonte F, Makova KD. 2018. Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate. Genome Res 28: 1767–1778. 10.1101/gr.241257.118 PubMed DOI PMC

Hänsel-Hertsch R, Beraldi D, Lensing SV, Marsico G, Zyner K, Parry A, Di Antonio M, Pike J, Kimura H, Narita M, et al. 2016. G-quadruplex structures mark human regulatory chromatin. Nat Genet 48: 1267–1272. 10.1038/ng.3662 PubMed DOI

Hile SE, Eckert KA. 2004. Positive correlation between DNA polymerase α-primase pausing and mutagenesis within polypyrimidine/polypurine microsatellite sequences. J Mol Biol 335: 745–759. 10.1016/j.jmb.2003.10.075 PubMed DOI

Hile SE, Wang X, Lee MYWT, Eckert KA. 2012. Beyond translesion synthesis: polymerase κ fidelity as a potential determinant of microsatellite stability. Nucleic Acids Res 40: 1636–1647. 10.1093/nar/gkr889 PubMed DOI PMC

Htun H, Dahlberg J. 1988. Single strands, triple strands, and kinks in H-DNA. Science 241: 1791–1796. 10.1126/science.3175620 PubMed DOI

Jain A, Wang G, Vasquez KM. 2008. DNA triple helices: biological consequences and therapeutic potential. Biochimie 90: 1117–1130. 10.1016/j.biochi.2008.02.011 PubMed DOI PMC

Jain A, Bacolla A, Chakraborty P, Grosse F, Vasquez KM. 2010. Human DHX9 helicase unwinds triple-helical DNA structures. Biochemistry 49: 6992–6999. 10.1021/bi100795m PubMed DOI PMC

Jain M, Olsen HE, Paten B, Akeson M. 2016. Erratum to: The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol 17: 256. 10.1186/s13059-016-1122-x PubMed DOI PMC

Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, Tyson JR, Beggs AD, Dilthey AT, Fiddes IT, et al. 2018. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36: 338–345. 10.1038/nbt.4060 PubMed DOI PMC

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, et al. 2020. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581: 434–443. 10.1038/s41586-020-2308-7 PubMed DOI PMC

Karst SM, Ziels RM, Kirkegaard RH, Sørensen EA, McDonald D, Zhu Q, Knight R, Albertsen M. 2021. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat Methods 18: 165–169. 10.1038/s41592-020-01041-y PubMed DOI

Kelkar YD, Eckert KA, Chiaromonte F, Makova KD. 2011. A matter of life or death: how microsatellites emerge in and vanish from the human genome. Genome Res 21: 2038–2048. 10.1101/gr.122937.111 PubMed DOI PMC

Kishikawa T, Momozawa Y, Ozeki T, Mushiroda T, Inohara H, Kamatani Y, Kubo M, Okada Y. 2019. Empirical evaluation of variant calling accuracy using ultra-deep whole-genome sequencing data. Sci Rep 9: 1784. 10.1038/s41598-018-38346-0 PubMed DOI PMC

Koo H-S, Wu H-M, Crothers DM. 1986. DNA bending at adenine thymine tracts. Nature 320: 501–506. 10.1038/320501a0 PubMed DOI

Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Morgan MT, Carey VJ. 2013. Software for computing and annotating genomic ranges. PLoS Comput Biol 9: e1003118. 10.1371/journal.pcbi.1003118 PubMed DOI PMC

Lee H, Schatz MC. 2012. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics 28: 2097–2105. 10.1093/bioinformatics/bts330 PubMed DOI PMC

Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34: 3094–3100. 10.1093/bioinformatics/bty191 PubMed DOI PMC

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. 10.1093/bioinformatics/btp352 PubMed DOI PMC

Logsdon GA, Vollger MR, Eichler EE. 2020. Long-read human genome sequencing and its applications. Nat Rev Genet 21: 597–614. 10.1038/s41576-020-0236-x PubMed DOI PMC

Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, et al. 2016. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538: 201–206. 10.1038/nature18964 PubMed DOI PMC

Metzker ML. 2010. Sequencing technologies: the next generation. Nat Rev Genet 11: 31–46. 10.1038/nrg2626 PubMed DOI

Mirkin EV, Mirkin SM. 2007. Replication fork stalling at natural impediments. Microbiol Mol Biol Rev 71: 13–35. 10.1128/MMBR.00030-06 PubMed DOI PMC

Nag DK, Petes TD. 1991. Seven-base-pair inverted repeats in DNA form stable hairpins in vivo in Saccharomyces cerevisiae. Genetics 129: 669–673. 10.1093/genetics/129.3.669 PubMed DOI PMC

Nielsen R, Paul JS, Albrechtsen A, Song YS. 2011. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12: 443–451. 10.1038/nrg2986 PubMed DOI PMC

Quail MA, Otto TD, Gu Y, Harris SR, Skelly TF, McQuillan JA, Swerdlow HP, Oyola SO. 2012. Optimal enzymes for amplifying sequencing libraries. Nat Methods 9: 10–11. 10.1038/nmeth.1814 PubMed DOI

Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842. 10.1093/bioinformatics/btq033 PubMed DOI PMC

R Core Team. 2022. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/.

Sahakyan AB, Chambers VS, Marsico G, Santner T, Di Antonio M, Balasubramanian S. 2017. Machine learning model for sequence-driven DNA G-quadruplex formation. Sci Rep 7: 14535. 10.1038/s41598-017-14017-4 PubMed DOI PMC

Schirmer M, D'Amore R, Ijaz UZ, Hall N, Quince C. 2016. Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics 17: 125. 10.1186/s12859-016-0976-y PubMed DOI PMC

Sen D, Gilbert W. 1988. Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature 334: 364–366. 10.1038/334364a0 PubMed DOI

Shafer ABA, Peart CR, Tusso S, Maayan I, Brelsford A, Wheat CW, Wolf JBW. 2017. Bioinformatic processing of RAD-seq data dramatically impacts downstream population genetic inference. Methods in Ecol Evol 8: 907–917. 10.1111/2041-210x.12700 DOI

Shin S-I, Ham S, Park J, Seo SH, Lim CH, Jeon H, Huh J, Roh T-Y. 2016. Z-DNA-forming sites identified by ChIP-Seq are associated with actively transcribed regions in the human genome. DNA Res 23: 477–486. 10.1093/dnares/dsw031 PubMed DOI PMC

Sinden RR, Pytlos-Sinden MJ, Potaman VN. 2007. Slipped strand DNA structures. Front Biosci 12: 4788–4799. 10.2741/2427 PubMed DOI

Singleton CK, Klysik J, Stirdivant SM, Wells RD. 1982. Left-handed Z-DNA is induced by supercoiling in physiological ionic conditions. Nature 299: 312–316. 10.1038/299312a0 PubMed DOI

Slatkin M, Racimo F. 2016. Ancient DNA and human history. Proc Natl Acad Sci USA 113: 6380–6387. 10.1073/pnas.1524306113 PubMed DOI PMC

Stein M, Hile SE, Weissensteiner MH, Lee M, Zhang S, Kejnovský E, Kejnovská I, Makova KD, Eckert KA. 2022. Variation in G-quadruplex sequence and topology differentially impacts human DNA polymerase fidelity. DNA Repair (Amst) 119: 103402. 10.1016/j.dnarep.2022.103402 PubMed DOI PMC

Stoler N, Nekrutenko A. 2021. Sequencing error profiles of Illumina sequencing instruments. NAR Genom Bioinform 3: lqab019. 10.1093/nargab/lqab019 PubMed DOI PMC

Tabangin ME, Woo JG, Martin LJ. 2009. The effect of minor allele frequency on the likelihood of obtaining false positives. BMC Proc 3 Suppl 7: S41. 10.1186/1753-6561-3-S7-S41 PubMed DOI PMC

Wagner J, Olson ND, Harris L, Khan Z, Farek J, Mahmoud M, Stankovic A, Kovacevic V, Yoo B, Miller N, et al. 2022. Benchmarking challenging small variants with linked and long reads. Cell Genom 2: 100128. 10.1016/j.xgen.2022.100128 PubMed DOI PMC

Wainschtein P, Jain D, Zheng Z, TOPMed Anthropometry Working Group, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Adrienne Cupples L, Shadyab AH, McKnight B, Shoemaker BM, et al. 2022. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat Genet 54: 263–273. 10.1038/s41588-021-00997-7 PubMed DOI PMC

Wang G, Vasquez KM. 2014. Impact of alternative DNA structures on DNA damage, DNA repair, and genetic instability. DNA Repair (Amst) 19: 143–151. 10.1016/j.dnarep.2014.03.017 PubMed DOI PMC

Wang AH-J, Quigley GJ, Kolpak FJ, Crawford JL, van Boom JH, van der Marel G, Rich A. 1979. Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature 282: 680–686. 10.1038/282680a0 PubMed DOI

Wickham H. 2011. ggplot2. Wiley Interdiscip Rev Comput Stat 3: 180–185. 10.1002/wics.147 DOI

Zhao J, Bacolla A, Wang G, Vasquez KM. 2010. Non-B DNA structure-induced genetic instability and evolution. Cell Mol Life Sci 67: 43–62. 10.1007/s00018-009-0131-2 PubMed DOI PMC

Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, Weng Z, Liu Y, Mason CE, Alexander N, et al. 2016. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 3: 160025. 10.1038/sdata.2016.25 PubMed DOI PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Special Issue "Bioinformatics of Unusual DNA and RNA Structures"

. 2024 May 10 ; 25 (10) : . [epub] 20240510

The complete sequence of a human Y chromosome

. 2023 Sep ; 621 (7978) : 344-354. [epub] 20230823

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...