Probably Correct: Rescuing Repeats with Short and Long Reads
Jazyk angličtina Země Švýcarsko Médium electronic
Typ dokumentu časopisecké články, práce podpořená grantem, přehledy
PubMed
33396198
PubMed Central
PMC7823596
DOI
10.3390/genes12010048
PII: genes12010048
Knihovny.cz E-zdroje
- Klíčová slova
- long reads, multi-mapping, reference, repeats, satellite,
- MeSH
- centromera chemie MeSH
- délka genomu MeSH
- genom lidský * MeSH
- lidé MeSH
- mapování chromozomů metody MeSH
- metylace DNA MeSH
- mikrosatelitní repetice * MeSH
- pohlavní chromozomy chemie MeSH
- telomery chemie MeSH
- výpočetní biologie metody MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- přehledy MeSH
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
Zobrazit více v PubMed
Lallemand T., Leduc M., Landès C., Rizzon C., Lerat E. An overview of duplicated gene detection methods: Why the duplication mechanism has to be accounted for in their choice. Genes. 2020;11:1046. doi: 10.3390/genes11091046. PubMed DOI PMC
Lerat E. Identifying repeats and transposable elements in sequenced genomes: How to find your way through the dense forest of programs. Heredity. 2010;104:520–533. doi: 10.1038/hdy.2009.165. PubMed DOI
Kojima K.K. Human transposable elements in Repbase: Genomic footprints from fish to humans. Mob. DNA. 2018;9:2. doi: 10.1186/s13100-017-0107-y. PubMed DOI PMC
Miga K.H. Centromere studies in the era of “telomere-to-telomere”genomics. Exp. Cell Res. 2020;394:112127. doi: 10.1016/j.yexcr.2020.112127. PubMed DOI PMC
Chaisson M.J.P., Huddleston J., Dennis M.Y., Sudmant P.H., Malig M., Hormozdiari F., Antonacci F., Surti U., Sandstrom R., Boitano M., et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature. 2015;517:608–611. doi: 10.1038/nature13907. PubMed DOI PMC
de Koning A.P.J., Gu W., Castoe T.A., Batzer M.A., Pollock D.D. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7:e1002384. doi: 10.1371/journal.pgen.1002384. PubMed DOI PMC
Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. PubMed
Haberer G., Kamal N., Bauer E., Gundlach H., Fischer I., Seidel M.A., Spannagl M., Marcon C., Ruban A., Urbany C., et al. European maize genomes highlight intraspecies variation in repeat and gene content. Nat. Genet. 2020;52:950–957. doi: 10.1038/s41588-020-0671-9. PubMed DOI PMC
Singh P.P., Affeldt S., Malaguti G., Isambert H. Human dominant disease genes are enriched in paralogs originating from whole genome duplication. PLoS Comput. Biol. 2014;10:e1003754. doi: 10.1371/journal.pcbi.1003754. PubMed DOI PMC
Sharp A.J., Locke D.P., McGrath S.D., Cheng Z., Bailey J.A., Vallente R.U., Pertz L.M., Clark R.A., Schwartz S., Segraves R., et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 2005;77:78–88. doi: 10.1086/431652. PubMed DOI PMC
Phan V., Gao S., Tran Q., Vo N.S. How genome complexity can explain the difficulty of aligning reads to genomes. BMC Bioinform. 2015;16:S3. doi: 10.1186/1471-2105-16-S17-S3. PubMed DOI PMC
Schatz M.C., Delcher A.L., Salzberg S.L. Assembly of large genomes using second-generation sequencing. Genome Res. 2010;20:1165–1173. doi: 10.1101/gr.101360.109. PubMed DOI PMC
Li W., Freudenberg J., Miramontes P. Diminishing return for increased Mappability with longer sequencing reads: Implications of the k-mer distributions in the human genome. BMC Bioinform. 2014;15:2. doi: 10.1186/1471-2105-15-2. PubMed DOI PMC
Li W., Freudenberg J. Mappability and read length. Front. Genet. 2014;5:381. doi: 10.3389/fgene.2014.00381. PubMed DOI PMC
Pan B., Kusko R., Xiao W., Zheng Y., Liu Z., Xiao C., Sakkiah S., Guo W., Gong P., Zhang C., et al. Similarities and differences between variants called with human reference genome HG19 or HG38. BMC Bioinform. 2019;20:101. doi: 10.1186/s12859-019-2620-0. PubMed DOI PMC
Ugarković Ð., Plohl M. Variation in satellite DNA profiles—Causes and effects. EMBO J. 2002;21:5955–5959. doi: 10.1093/emboj/cdf612. PubMed DOI PMC
Miga K.H., Newton Y., Jain M., Altemose N., Willard H.F., Kent W.J. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014;24:697–707. doi: 10.1101/gr.159624.113. PubMed DOI PMC
Wei K.H.-C., Grenier J.K., Barbash D.A., Clark A.G. Correlated variation and population differentiation in satellite DNA abundance among lines of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA. 2014;111:18793–18798. doi: 10.1073/pnas.1421951112. PubMed DOI PMC
Cechova M., Harris R.S., Tomaszkiewicz M., Arbeithuber B., Chiaromonte F., Makova K.D. High satellite repeat turnover in great apes studied with short- and long-read technologies. Mol. Biol. Evol. 2019;36 doi: 10.1093/molbev/msz156. PubMed DOI PMC
Lower S.S., McGurk M.P., Clark A.G., Barbash D.A. Satellite DNA evolution: Old ideas, new approaches. Curr. Opin. Genet. Dev. 2018;49:70–78. doi: 10.1016/j.gde.2018.03.003. PubMed DOI PMC
Logsdon G.A., Gambogi C.W., Liskovykh M.A., Barrey E.J., Larionov V., Miga K.H., Heun P., Black B.E. Human artificial chromosomes that bypass centromeric DNA. Cell. 2019;178:624–639.e19. doi: 10.1016/j.cell.2019.06.006. PubMed DOI PMC
Miga K.H. Centromeric satellite DNAs: Hidden sequence variation in the human population. Genes. 2019;10:352. doi: 10.3390/genes10050352. PubMed DOI PMC
Schröder J., Girirajan S., Papenfuss A.T., Medvedev P. Improving the power of structural variation detection by augmenting the reference. PLoS ONE. 2015;10:e0136771. doi: 10.1371/journal.pone.0136771. PubMed DOI PMC
Zhao T., Duan Z., Genchev G.Z., Lu H. Closing human reference genome gaps: Identifying and characterizing gap-closing sequences. G3. 2020;10:2801–2809. doi: 10.1534/g3.120.401280. PubMed DOI PMC
Altemose N., Miga K.H., Maggioni M., Willard H.F. Genomic characterization of large heterochromatic gaps in the human genome assembly. PLoS Comput. Biol. 2014;10:e1003628. doi: 10.1371/journal.pcbi.1003628. PubMed DOI PMC
Peona V., Weissensteiner M.H., Suh A. How complete are “complete” genome assemblies? An avian perspective. Mol. Ecol. Resour. 2018;18:1188–1195. doi: 10.1111/1755-0998.12933. PubMed DOI
Salzberg S.L., Yorke J.A. Beware of mis-assembled genomes. Bioinformatics. 2005;21:4320–4321. doi: 10.1093/bioinformatics/bti769. PubMed DOI
Li H. Identifying centromeric satellites with dna-brnn. Bioinformatics. 2019;35:4408–4410. doi: 10.1093/bioinformatics/btz264. PubMed DOI PMC
Cheng H., Concepcion G.T., Feng X., Zhang H., Li H. Haplotype-resolved de novo assembly with phased assembly graphs. arXiv. 20202008.01237v1 PubMed PMC
GIS The (Near) Complete Sequence of a Human Genome. [(accessed on 25 October 2020)]; Available online: https://genomeinformatics.github.io/CHM13v1/
Logsdon G.A., Vollger M.R., Hsieh P., Mao Y., Liskovykh M.A., Koren S., Nurk S., Mercuri L., Dishuck P.C., Rhie A., et al. The structure, function, and evolution of a complete human chromosome 8. bioRxiv. 2020 doi: 10.1101/2020.09.08.285395. PubMed DOI PMC
Miga K.H., Koren S., Rhie A., Vollger M.R., Gershman A., Bzikadze A., Brooks S., Howe E., Porubsky D., Logsdon G.A., et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585:79–84. doi: 10.1038/s41586-020-2547-7. PubMed DOI PMC
Liu Y., Koyutürk M., Maxwell S., Xiang M., Veigl M., Cooper R.S., Tayo B.O., Li L., LaFramboise T., Wang Z., et al. Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing. BMC Genom. 2014;15:685. doi: 10.1186/1471-2164-15-685. PubMed DOI PMC
Li R., Tian X., Yang P., Fan Y., Li M., Zheng H., Wang X., Jiang Y. Recovery of non-reference sequences missing from the human reference genome. BMC Genom. 2019;20:746. doi: 10.1186/s12864-019-6107-1. PubMed DOI PMC
Sherman R.M., Forman J., Antonescu V., Puiu D., Daya M., Rafaels N., Boorgula M.P., Chavan S., Vergara C., Ortega V.E., et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 2019;51:30–35. doi: 10.1038/s41588-018-0273-y. PubMed DOI PMC
Eisfeldt J., Mårtensson G., Ameur A., Nilsson D., Lindstrand A. Discovery of novel sequences in 1.000 Swedish genomes. Mol. Biol. Evol. 2020;37:18–30. doi: 10.1093/molbev/msz176. PubMed DOI PMC
Ameur A., Che H., Martin M., Bunikis I., Dahlberg J., Höijer I., Häggqvist S., Vezzi F., Nordlund J., Olason P., et al. De novo assembly of two Swedish genomes reveals missing segments from the human GRCh38 reference and improves variant calling of population-scale sequencing data. Genes. 2018;9:486. doi: 10.3390/genes9100486. PubMed DOI PMC
Tian C., Gregersen P.K., Seldin M.F. Accounting for ancestry: Population substructure and genome-wide association studies. Hum. Mol. Genet. 2008;17:R143–R150. doi: 10.1093/hmg/ddn268. PubMed DOI PMC
Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., Daly M.J. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. PubMed DOI PMC
Nagasaki M., Kuroki Y., Shibata T.F., Katsuoka F., Mimori T., Kawai Y., Minegishi N., Hozawa A., Kuriyama S., Suzuki Y., et al. Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing. Hum. Genome Var. 2019;6:27. doi: 10.1038/s41439-019-0057-7. PubMed DOI PMC
Li H. Which Human Reference Genome to Use? [(accessed on 14 October 2020)]; Available online: https://lh3.github.io/2017/11/13/which-human-reference-genome-to-use.
Song S., Huang Q., Guo J., Li-Ling J., Chen X., Ma F. Comparative component analysis of exons with different splicing frequencies. PLoS ONE. 2009;4:e5387. doi: 10.1371/journal.pone.0005387. PubMed DOI PMC
Liang D., Wilusz J.E. Short intronic repeat sequences facilitate circular RNA production. Genes Dev. 2014;28:2233–2247. doi: 10.1101/gad.251926.114. PubMed DOI PMC
Lozada-Chávez I., Stadler P.F., Prohaska S.J. Genome-wide features of introns are evolutionary decoupled among themselves and from genome size throughout Eukarya. bioRxiv. 2018 doi: 10.1101/283549. DOI
Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. PubMed DOI PMC
Langmead B. Aligning short sequencing reads with Bowtie. Curr. Protoc. Bioinform. 2010;11 doi: 10.1002/0471250953.bi1107s32. PubMed DOI PMC
Kim D., Langmead B., Salzberg S.L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 2015;12:357–360. doi: 10.1038/nmeth.3317. PubMed DOI PMC
Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. PubMed DOI PMC
Novák P., Ávila Robledillo L., Koblížková A., Vrbová I., Neumann P., Macas J. TAREAN: A computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res. 2017;45:e111. doi: 10.1093/nar/gkx257. PubMed DOI PMC
Deschamps-Francoeur G., Simoneau J., Scott M.S. Handling multi-mapped reads in RNA-seq. Comput. Struct. Biotechnol. J. 2020;18:1569–1576. doi: 10.1016/j.csbj.2020.06.014. PubMed DOI PMC
Robert C., Watson M. Errors in RNA-Seq quantification affect genes of relevance to human disease. Genome Biol. 2015;16:177. doi: 10.1186/s13059-015-0734-x. PubMed DOI PMC
Zytnicki M. mmquant: How to count multi-mapping reads? BMC Bioinform. 2017;18:411. doi: 10.1186/s12859-017-1816-4. PubMed DOI PMC
Turro E., Su S.-Y., Gonçalves Â., Coin L.J.M., Richardson S., Lewin A. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol. 2011;12:R13. doi: 10.1186/gb-2011-12-2-r13. PubMed DOI PMC
Raghupathy N., Choi K., Vincent M.J., Beane G.L., Sheppard K.S., Munger S.C., Korstanje R., Pardo-Manual de Villena F., Churchill G.A. Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression. Bioinformatics. 2018;34:2177–2184. doi: 10.1093/bioinformatics/bty078. PubMed DOI PMC
Li B., Dewey C.N. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011;12:323. doi: 10.1186/1471-2105-12-323. PubMed DOI PMC
Bray N.L., Pimentel H., Melsted P., Pachter L. Erratum: Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016;34:888. doi: 10.1038/nbt0816-888d. PubMed DOI
Patro R., Duggal G., Love M.I., Irizarry R.A., Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods. 2017;14:417–419. doi: 10.1038/nmeth.4197. PubMed DOI PMC
Bonfert T., Csaba G., Zimmer R., Friedel C.C. A context-based approach to identify the most likely mapping for RNA-seq experiments. BMC Bioinform. 2012;13:S9. doi: 10.1186/1471-2105-13-S6-S9. PubMed DOI PMC
Zhang X., Robertson G., Krzywinski M., Ning K., Droit A., Jones S., Gottardo R. PICS: Probabilistic inference for ChIP-seq. Biometrics. 2011;67:151–163. doi: 10.1111/j.1541-0420.2010.01441.x. PubMed DOI
Hughes J.F., Skaletsky H., Pyntikova T., Graves T.A., van Daalen S.K.M., Minx P.J., Fulton R.S., McGrath S.D., Locke D.P., Friedman C., et al. Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature. 2010;463:536–539. doi: 10.1038/nature08700. PubMed DOI PMC
Zheng Y., Ay F., Keles S. Generative modeling of multi-mapping reads with mHi-C advances analysis of Hi-C studies. eLife. 2019;8:e38070. doi: 10.7554/eLife.38070. PubMed DOI PMC
Cechova M., Vegesna R., Tomaszkiewicz M., Harris R.S., Chen D., Rangavittal S., Medvedev P., Makova K.D. Dynamic evolution of great ape Y chromosomes. Proc. Natl. Acad. Sci. USA. 2020;117:26273–26280. doi: 10.1073/pnas.2001749117. PubMed DOI PMC
Johnson N.R., Yeoh J.M., Coruh C., Axtell M.J. Improved placement of multi-mapping small RNAs. G3. 2016;6:2103–2111. doi: 10.1534/g3.116.030452. PubMed DOI PMC
Nielsen H.B., Almeida M., Juncker A.S., Rasmussen S., Li J., Sunagawa S., Plichta D.R., Gautier L., Pedersen A.G., Le Chatelier E., et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 2014;32:822–828. doi: 10.1038/nbt.2939. PubMed DOI
Tomaszkiewicz M., Medvedev P., Makova K.D. Y and W chromosome assemblies: Approaches and discoveries. Trends Genet. 2017;33:266–282. doi: 10.1016/j.tig.2017.01.008. PubMed DOI
Clayton D.G. Sex chromosomes and genetic association studies. Genome Med. 2009;1:110. doi: 10.1186/gm110. PubMed DOI PMC
. Accounting for sex in the genome. Nat. Med. 2017;23:1243. doi: 10.1038/nm.4445. PubMed DOI
König I.R., Loley C., Erdmann J., Ziegler A. How to include chromosome X in your genome-wide association study. Genet. Epidemiol. 2014;38:97–103. doi: 10.1002/gepi.21782. PubMed DOI
Webster T.H., Couse M., Grande B.M., Karlins E., Phung T.N., Richmond P.A., Whitford W., Wilson M.A. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. Gigascience. 2019;8 doi: 10.1093/gigascience/giz074. PubMed DOI PMC
Olney K.C., Brotman S.M., Andrews J.P., Valverde-Vesling V.A., Wilson M.A. Reference genome and transcriptome informed by the sex chromosome complement of the sample increase ability to detect sex differences in gene expression from RNA-Seq data. Biol. Sex Differ. 2020;11:42. doi: 10.1186/s13293-020-00312-9. PubMed DOI PMC
Wick R.R., Holt K.E. Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Research. 2019;8:2138. doi: 10.12688/f1000research.21782.1. PubMed DOI PMC
Jain M., Olsen H.E., Turner D.J., Stoddart D., Bulazel K.V., Paten B., Haussler D., Willard H.F., Akeson M., Miga K.H. Linear assembly of a human Y chromosome centromere. Nat. Biotechnol. 2018;36:321. doi: 10.1038/nbt.4109. PubMed DOI PMC
Jain M., Koren S., Miga K.H., Quick J., Rand A.C., Sasani T.A., Tyson J.R., Beggs A.D., Dilthey A.T., Fiddes I.T., et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 2018;36:338–345. doi: 10.1038/nbt.4060. PubMed DOI PMC
Vollger M.R., Logsdon G.A., Audano P.A., Sulovari A., Porubsky D., Peluso P., Wenger A.M., Concepcion G.T., Kronenberg Z.N., Munson K.M., et al. Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann. Hum. Genet. 2020;84:125–140. doi: 10.1111/ahg.12364. PubMed DOI PMC
Dudchenko O., Batra S.S., Omer A.D., Nyquist S.K., Hoeger M., Durand N.C., Shamim M.S., Machol I., Lander E.S., Aiden A.P., et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. PubMed DOI PMC
Howe K., Wood J.M.D. Using optical mapping data for the improvement of vertebrate genome assemblies. GigaScience. 2015;4:10. doi: 10.1186/s13742-015-0052-y. PubMed DOI PMC
Hoang P.T.N., Fiebig A., Novák P., Macas J., Cao H.X., Stepanenko A., Chen G., Borisjuk N., Scholz U., Schubert I. Chromosome-scale genome assembly for the duckweed Spirodela intermedia, integrating cytogenetic maps, PacBio and Oxford Nanopore libraries. Sci. Rep. 2020;10:19230. doi: 10.1038/s41598-020-75728-9. PubMed DOI PMC
Suzuki S., Ranade S., Osaki K., Ito S., Shigenari A., Ohnuki Y., Oka A., Masuya A., Harting J., Baybayan P., et al. Reference grade characterization of polymorphisms in full-length HLA class I and II genes with short-read sequencing on the ION PGM system and long-reads generated by single molecule, real-time sequencing on the PacBio platform. Front. Immunol. 2018;9:2294. doi: 10.3389/fimmu.2018.02294. PubMed DOI PMC
Turner T.R., Hayhurst J.D., Hayward D.R., Bultitude W.P., Barker D.J., Robinson J., Madrigal J.A., Mayor N.P., Marsh S.G.E. Single molecule real-time DNA sequencing of HLA genes at ultra-high resolution from 126 international HLA and immunogenetics workshop cell lines. Hladnikia. 2018;91:88–101. doi: 10.1111/tan.13184. PubMed DOI
Albrecht V., Zweiniger C., Surendranath V., Lang K., Schöfl G., Dahl A., Winkler S., Lange V., Böhme I., Schmidt A.H. Dual redundant sequencing strategy: Full-length gene characterisation of 1056 novel and confirmatory HLA alleles. Hladnikia. 2017;90:79–87. doi: 10.1111/tan.13057. PubMed DOI PMC
Chin C.-S., Wagner J., Zeng Q., Garrison E., Garg S., Fungtammasan A., Rautiainen M., Aganezov S., Kirsche M., Zarate S., et al. A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat. Commun. 2020;11:4794. doi: 10.1038/s41467-020-18564-9. PubMed DOI PMC
Harris R.S., Cechova M., Makova K.D. Noise-cancelling repeat finder: Uncovering tandem repeats in error-prone long-read sequencing data. Bioinformatics. 2019;35:4809–4811. doi: 10.1093/bioinformatics/btz484. PubMed DOI PMC
Mitsuhashi S., Frith M.C., Mizuguchi T., Miyatake S., Toyota T., Adachi H., Oma Y., Kino Y., Mitsuhashi H., Matsumoto N. Tandem-genotypes: Robust detection of tandem repeat expansions from long DNA reads. Genome Biol. 2019;20:58. doi: 10.1186/s13059-019-1667-6. PubMed DOI PMC
Ummat A., Bashir A. Resolving complex tandem repeats with long reads. Bioinformatics. 2014;30:3491–3498. doi: 10.1093/bioinformatics/btu437. PubMed DOI
Sun C., Medvedev P. VarMatch: Robust matching of small variant datasets using flexible scoring schemes. Bioinformatics. 2017;33:1301–1308. doi: 10.1093/bioinformatics/btw797. PubMed DOI
Mousavi N., Margoliash J., Pusarla N., Saini S., Yanicky R., Gymrek M. TRTools: A toolkit for genome-wide analysis of tandem repeats. Bioinformatics. 2020 doi: 10.1093/bioinformatics/btaa736. PubMed DOI PMC
Mikheenko A., Bzikadze A.V., Gurevich A., Miga K.H., Pevzner P.A. TandemTools: Mapping long reads and assessing/improving assembly quality in extra-long tandem repeats. Bioinformatics. 2020;36:i75–i83. doi: 10.1093/bioinformatics/btaa440. PubMed DOI PMC
Jain C., Rhie A., Zhang H., Chu C., Walenz B.P., Koren S., Phillippy A.M. Weighted minimizer sampling improves long read mapping. Bioinformatics. 2020;36:i111–i118. doi: 10.1093/bioinformatics/btaa435. PubMed DOI PMC
Jain C., Rhie A., Hansen N., Koren S., Phillippy A.M. A long read mapping method for highly repetitive reference sequences. Cold Spring Harb. Lab. 2020;2020:363887.
Nanopore Technologies R10.3: The Newest Nanopore for High Accuracy Nanopore Sequencing. [(accessed on 5 November 2020)]; Available online: https://nanoporetech.com/about-us/news/r103-newest-nanopore-high-accuracy-nanopore-sequencing-now-available-store.
Nurk S., Walenz B.P., Rhie A., Vollger M.R., Logsdon G.A., Grothe R., Miga K.H., Eichler E.E., Phillippy A.M., Koren S. HiCanu: Accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30:1291–1305. doi: 10.1101/gr.263566.120. PubMed DOI PMC
Wenger A.M., Peluso P., Rowell W.J., Chang P.-C., Hall R.J., Concepcion G.T., Ebler J., Fungtammasan A., Kolesnikov A., Olson N.D., et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 2019;37:1155–1162. doi: 10.1038/s41587-019-0217-9. PubMed DOI PMC
Salari F., Zare-Mirakabad F., Sadeghi M., Rokni-Zadeh H. Assessing the impact of exact reads on reducing the error rate of read mapping. BMC Bioinform. 2018;19:406. doi: 10.1186/s12859-018-2432-7. PubMed DOI PMC
Mondo S.J., Dannebaum R.O., Kuo R.C., Louie K.B., Bewick A.J., LaButti K., Haridas S., Kuo A., Salamov A., Ahrendt S.R., et al. Widespread adenine N6-methylation of active genes in fungi. Nat. Genet. 2017;49:964–968. doi: 10.1038/ng.3859. PubMed DOI
Ding H., Bailey A.D., Jain M., Olsen H., Paten B. Gaussian mixture model-based unsupervised nucleotide modification number detection using nanopore-sequencing readouts. Bioinformatics. 2020;8:4928–4934. doi: 10.1093/bioinformatics/btaa601. PubMed DOI PMC
Beaulaurier J., Zhu S., Deikus G., Mogno I., Zhang X.-S., Davis-Richardson A., Canepa R., Triplett E.W., Faith J.J., Sebra R., et al. Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation. Nat. Biotechnol. 2018;36:61–69. doi: 10.1038/nbt.4037. PubMed DOI PMC
Schatz M.C. Nanopore sequencing meets epigenetics. Nat. Methods. 2017;14:347–348. doi: 10.1038/nmeth.4240. PubMed DOI
Schreiber J., Wescoe Z.L., Abu-Shumays R., Vivian J.T., Baatar B., Karplus K., Akeson M. Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands. Proc. Natl. Acad. Sci. USA. 2013;110:18910–18915. doi: 10.1073/pnas.1310615110. PubMed DOI PMC
Liu Y., Cheng J., Siejka-Zielińska P., Weldon C., Roberts H., Lopopolo M., Magri A., D’Arienzo V., Harris J.M., McKeating J.A., et al. Accurate targeted long-read DNA methylation and hydroxymethylation sequencing with TAPS. Genome Biol. 2020;21:54. doi: 10.1186/s13059-020-01969-6. PubMed DOI PMC
Liu Q., Georgieva D.C., Egli D., Wang K. NanoMod: A computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genom. 2019;20:78. doi: 10.1186/s12864-018-5372-8. PubMed DOI PMC
Vollger M.R., Dishuck P.C., Sorensen M., Welch A.E., Dang V., Dougherty M.L., Graves-Lindsay T.A., Wilson R.K., Chaisson M.J.P., Eichler E.E. Long-read sequence and assembly of segmental duplications. Nat. Methods. 2019;16:88–94. doi: 10.1038/s41592-018-0236-3. PubMed DOI PMC
Koren S., Rhie A., Walenz B.P., Dilthey A.T., Bickhart D.M., Kingan S.B., Hiendleder S., Williams J.L., Smith T.P.L., Phillippy A.M. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 2018;36:1174–1182. doi: 10.1038/nbt.4277. PubMed DOI PMC
Garg S., Fungtammasan A., Carroll A., Chou M., Schmitt A., Zhou X., Mac S., Peluso P., Hatas E., Ghurye J., et al. Chromosome-scale, haplotype-resolved assembly of human genomes. Nat. Biotechnol. 2020 doi: 10.1038/s41587-020-0711-0. PubMed DOI PMC
Porubsky D., Ebert P., Audano P.A., Vollger M.R., Harvey W.T., Marijon P., Ebler J., Munson K.M., Sorensen M., Sulovari A., et al. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. 2020 doi: 10.1038/s41587-020-0719-5. PubMed DOI PMC
Di Genova A., Buena-Atienza E., Ossowski S., Sagot M.-F. Efficient hybrid de novo assembly of human genomes with WENGAN. Nat. Biotechnol. 2020 doi: 10.1038/s41587-020-00747-w. PubMed DOI PMC
Asalone K.C., Ryan K.M., Yamadi M., Cohen A.L., Farmer W.G., George D.J., Joppert C., Kim K., Mughal M.F., Said R., et al. Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Comput. Biol. 2020;16:e1008104. doi: 10.1371/journal.pcbi.1008104. PubMed DOI PMC
The Computational Pan-Genomics Consortium Computational pan-genomics: Status, promises and challenges. Brief. Bioinform. 2018;19:118–135. PubMed PMC
Li H., Feng X., Chu C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 2020;21:265. doi: 10.1186/s13059-020-02168-z. PubMed DOI PMC
The 1000 Genomes Project Consortium. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., et al. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. PubMed DOI PMC
Satellite DNAs and human sex chromosome variation