Long-read sequence assembly: a technical evaluation in barley
Jazyk angličtina Země Anglie, Velká Británie Médium print
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
33710295
PubMed Central
PMC8290290
DOI
10.1093/plcell/koab077
PII: 6169005
Knihovny.cz E-zdroje
- MeSH
- anotace sekvence MeSH
- genom rostlinný MeSH
- genomika metody MeSH
- intergenová DNA MeSH
- ječmen (rod) genetika MeSH
- koncové repetice MeSH
- retroelementy MeSH
- sekvenční analýza DNA MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- intergenová DNA MeSH
- retroelementy MeSH
Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Center for Integrated Breeding Research Georg August University Göttingen Göttingen 37073 Germany
Department of Plant and Microbial Biology University of Zürich Zürich 8008 Switzerland
German Centre for Integrative Biodiversity Research Halle Jena Leipzig Leipzig 04103 Germany
Global Institute for Food Security University of Saskatchewan Saskatoon SK S7N 4L8 Canada
HudsonAlpha Institute for Biotechnology Huntsville AL 35806
Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Seeland 06466 Germany
The Sainsbury Laboratory University of East Anglia Norwich NR4 7UH UK
Zobrazit více v PubMed
Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, et al. (2020) Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182:145– 161.e123 PubMed PMC
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 PubMed
Arend D, Junker A, Scholz U, Schüler D, Wylie J, Lange M (2016) PGP repository: a plant phenomics and genomics data publication infrastructure. Database 2016: baw033. PubMed PMC
Arend D, Lange M, Chen J, Colmsee C, Flemming S, Hecht D, Scholz U (2014) e! DAL-a framework to store, share and publish research data. BMC Bioinformatics 15:214. PubMed PMC
Ariyadasa R, Mascher M, Nussbaumer T, Schulte D, Frenkel Z, Poursarebani N, Zhou R, Steuernagel B, Gundlach H, Taudien S, et al. (2014) A sequence-ready physical map of barley anchored genetically by two million single-nucleotide polymorphisms. Plant Physiol 164:412–423 PubMed PMC
Avni R, Nave M, Barad O, Baruch K, Twardziok SO, Gundlach H, Hale I, Mascher M, Spannagl M, Wiebe K, et al. (2017) Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science 357:93–97 PubMed
Babb S, Muehlbauer G (2003) Genetic and morphological characterization of the barley uniculm2 (cul2) mutant. Theor Appl Genet 106:846–857 PubMed
Bernhardt N, Brassac J, Dong X, Willing E-M, Poskar CH, Kilian B, Blattner FR (2020) Genome-wide sequence information reveals recurrent hybridization among diploid wheat wild relatives. Plant J 102:493–506 PubMed
Blattner FR (2018) Taxonomy of the Genus Hordeum and Barley (Hordeum vulgare). In Stein N, Muehlbauer GJ, eds, The Barley Genome. Springer International Publishing, Cham, pp. 11–23
Buchmann JP, Matsumoto T, Stein N, Keller B, Wicker T (2012) Inter-species sequence comparison of Brachypodium reveals how transposon activity corrodes genome colinearity. Plant J 71:550–563 PubMed
Bushnell B, Rood J, Singer E (2017) BBMerge—accurate paired shotgun read merging via overlap. PLoS ONE 12:e0185056. PubMed PMC
Campoy JA, Sun H, Goel M, Jiao W-B, Folz-Donahue K, Kukat C, Rubio M, Ruiz D, Huettel B, Schneeberger K (2020) Chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes. Genome Biol 21:306. bioRxiv: 2020.2004.2024.060046 PubMed PMC
Chapman JA, Mascher M, Buluc A, Barry K, Georganas E, Session A, Strnadova V, Jenkins J, Sehgal S, Oliker L, et al. (2015) A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 16:26. PubMed PMC
Chin C-S, Peluso P, Sedlazeck FJ, Nattestad M,, Concepcion GT, Clum A, Dunn C, O’Malley R, Figueroa-Balderas R, Morales-Cruz A, et al. (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13:1050–1054 PubMed PMC
Dai F, Wang X, Zhang X-Q, Chen Z, Nevo E, Jin G, Wu D, Li C, Zhang G (2018) Assembly and analysis of a qingke reference genome demonstrate its close genetic relation to modern cultivated barley. Plant Biotechnol J 16:760–770 PubMed PMC
Druka A, Franckowiak J, Lundqvist U, Bonar N, Alexander J, Houston K, Radovic S, Shahinnia F, Vendramin V, Morgante M, et al. (2011) Genetic dissection of Barley morphology and development. Plant Physiol 155:617–627 PubMed PMC
Dvorak J, McGuire PE, Cassidy B (1988) Apparent sources of the A genomes of wheats inferred from polymorphism in abundance and restriction fragment length of repeated nucleotide sequences. Genome 30:680–689
Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18. PubMed PMC
Garg S, Fungtammasan A, Carroll A, Chou M, Schmitt A, Zhou X, Mac S, Peluso P, Hatas E, Ghurye J, et al. (2020) Chromosome-scale, haplotype-resolved assembly of human genomes. Nat Biotechnol (https://doi.org/10.1038/s41587-020-0711-0) PubMed PMC
Genova AD, Buena-Atienza E, Ossowski S, Sagot M-F (2019) WENGAN: Efficient and high quality hybrid de novo assembly of human genomes. Nat Biotechnol. (doi: 10.1038/s41587-020-00747-w) PubMed PMC
Gremme G, Brendel V, Sparks ME, Kurtz S (2005) Engineering a software tool for gene structure prediction in higher organisms. Inf Software Technol 47:965–978
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31:5654–5666 PubMed PMC
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9:R7. PubMed PMC
Hoff KJ, Stanke M (2019) Predicting genes in single genomes with AUGUSTUS. Curr Protoc Bioinformatics 65:e57. PubMed
International Barley Genome Sequencing Consortium (2012) A physical, genetic and functional sequence assembly of the barley genome. Nature 491:711 PubMed
International Wheat Genome Sequencing Consortium (IWGSC) (2014) A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345:1251788 PubMed
Jayakodi M, Padmarasu S, Haberer G, Bonthala V, Gundlach H, Monat C, Lux T, Kamal N, Lang D, Himmelbach A, et al. (2020) The barley pan-genome reveals the hidden legacy of mutation breeding. Nature 588:284–289 PubMed PMC
Jost M, Taketa S, Mascher M, Himmelbach A, Yuo T, Shahinnia F, Rutten T, Druka A, Schmutzer T, Steuernagel B, et al. (2016) A homolog of—blade-on-petiole 1 and 2 (BOP1/2) controls internode length and homeotic changes of the barley inflorescence. Plant Physiol 171:1113–1127 PubMed PMC
Jung H, Winefield C, Bombarely A, Prentis P, Waterhouse P (2019) Tools and strategies for long-read sequencing and de novo assembly of plant genomes. Trends Plant Sci 24:700–724 PubMed
Katoh K, Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 9:286–298 PubMed
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL (2019) Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37:907–915 PubMed PMC
Koren S, Rhie A, Walenz BP, Dilthey AT, Bickhart DM, Kingan SB, Hiendleder S, Williams JL, Smith TPL, Phillippy AM (2018) De novo assembly of haplotype-resolved genomes with trio binning. Nat Biotechnol 36:1174–1182 PubMed PMC
Kurtz S, Narechania A, Stein JC, Ware D (2008) A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 9:517. PubMed PMC
Li H (2015) BFC: correcting Illumina sequencing errors. Bioinformatics 31:2885–2887 PubMed PMC
Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 1:7 PubMed PMC
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079 PubMed PMC
Liu J, Seetharam AS, Chougule K, Ou S, Swentowsky KW, Gent JI, Llaca V, Woodhouse MR, Manchanda N, Presting GG, et al. (2020) Gapless assembly of maize chromosomes using long-read technologies. Genome Biol 21:121. PubMed PMC
Liu Y, Du H, Li P, Shen Y, Peng H, Liu S, Zhou G-A, Zhang H, Liu Z, Shi M, et al. (2020) Pan-genome of wild and cultivated soybeans. Cell 182:162–176.e113 PubMed
Maccaferri M, Harris NS, Twardziok SO, Pasam RK, Gundlach H, Spannagl M, Ormanbekova D, Lux T, Prade VM, Milner SG, et al. (2019) Durum wheat genome highlights past domestication signatures and future improvement targets. Nat Genet 51:885–895 PubMed
Manninen I, Schulman AH (1993) BARE-1, a copia-like retroelement in barley (Hordeum vulgare L.). Plant Mol Biol 22:829–846 PubMed
Mascher M, Gundlach H, Himmelbach A, Beier S, Twardziok SO, Wicker T, Radchuk V, Dockter C, Hedley PE, Russell J, et al. (2017) A chromosome conformation capture ordered sequence of the barley genome. Nature 544:427–433 PubMed
Mascher M, Muehlbauer GJ, Rokhsar DS, Chapman J, Schmutz J, Barry K, Munoz-Amatriain M, Close TJ, Wise RP, Schulman AH, et al. (2013) Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ). Plant J 76:718–727 PubMed PMC
Matsumoto T, Tanaka T, Sakai H, Amano N, Kanamori H, Kurita K, Kikuta A, Kamiya K, Yamamoto M, Ikawa H, et al. (2011) Comprehensive sequence analysis of 24,783 barley full-length cDNAs derived from 12 clone libraries. Plant Physiol 156:20–28 PubMed PMC
Mayer KFX, Martis M, Hedley PE, Šimková H, Liu H, Morris JA, Steuernagel B, Taudien S, Roessner S, Gundlach H, et al. (2011) Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell 23:1249–1263 PubMed PMC
Meyers BC, Kaushik S, Nandety RS (2005) Evolving disease resistance genes. Curr Opin Plant Biol 8:129–134 PubMed
Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, Brooks S, Howe E, Porubsky D, Logsdon GA, et al. (2020) Telomere-to-telomere assembly of a complete human X chromosome. Nature 585:79–84 PubMed PMC
Molnár-Láng M, Ceoloni C, Doležel J (2015) Alien Introgression in Wheat. Springer, Cham
Monat C, Padmarasu S, Lux T, Wicker T, Gundlach H, Himmelbach A, Ens J, Li C, Muehlbauer GJ, Schulman AH, et al. (2019) TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools. Genome Biol 20:284. PubMed PMC
Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S (2020) HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res 30:1291–1305 PubMed PMC
Pankin A, Campoli C, Dong X, Kilian B, Sharma R, Himmelbach A, Saini R, Davis SJ, Stein N, Schneeberger K, et al. (2014) Mapping-by-sequencing identifies HvPHYTOCHROME C as a candidate gene for the early maturity 5 locus modulating the circadian clock and photoperiodic flowering in barley. Genetics 198:383–396 PubMed PMC
Pertea M, Pertea GM, Antonescu CM, Chang T-C,, Mendell JT, Salzberg SL (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33:290–295 PubMed PMC
Rabanus-Wallace MT, Hackauf B, Mascher M, Lux T, Wicker T, Gundlach H, Báez M, Houben A, Mayer KFX, Guo L, et al. (2019) Chromosome-scale genome assembly provides insights into rye biology, evolution, and agronomic potential. bioRxiv: 2019.2012.2011.869693 PubMed PMC
Rapazote-Flores P, Bayer M, Milne L, Mayer C-D, Fuller J, Guo W, Hedley PE, Morris J, Halpin C, Kam J, et al. (2019) BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq. BMC Genomics 20:968. PubMed PMC
Ruan J, Li H (2020) Fast and accurate long-read assembly with wtdbg2. Nat Methods 17:155–158 PubMed PMC
SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL (1998) The paleontology of intergene retrotransposons of maize. Nat Genet 20:43–45. PubMed
Sasaki T ; International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 PubMed
Schweizer P, Stein N (2011) Large-scale data integration reveals colocalization of gene functional groups with meta-QTL for multiple disease resistance in barley. Mol Plant-Microbe Interact 24:1492–1501 PubMed
Shahinnia F, Druka A, Franckowiak J, Morgante M, Waugh R, Stein N (2012) High resolution mapping of Dense spike-ar (dsp.ar) to the genetic centromere of barley chromosome 7H. Theor Appl Genet 124:373–384 PubMed
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210–3212 PubMed
Šimková H, Číhalíková J, Vrána J, Lysák MA, Doležel J (2003) Preparation of HMW DNA from plant nuclei and chromosomes isolated from root tips. Biol Plantarum 46:369–373
Staden R, Judge DP, Bonfield JK (2003) Analyzing sequences using the staden package and EMBOSS. In Krawetz SA, Womble DD, eds, Introduction to Bioinformatics: A Theoretical and Practical Approach, Humana Press, Totowa, NJ, pp. 393–410
Stanke M, Schöffmann O, Morgenstern B, Waack S (2006) Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics 7:62. PubMed PMC
Stein N, Muehlbauer GJ (2018) The barley genome, Springer, Cham, Switzerland
Suoniemi A, Schmidt D, Schulman AH (1997) Evolution and Impact of Transposable Elements. Springer, Germany, pp. 219–230
The Arabidopsis Genome I (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 PubMed
The International Wheat Genome Sequencing Consortium (IWGSC) (2018) Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361: eaar7191 PubMed
Thind AK, Wicker T, Müller T, Ackermann PM, Steuernagel B, Wulff BBH, Spannagl M, Twardziok SO, Felder M, Lux T, et al. (2018) Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome dynamics between two wheat cultivars. Genome Biol 19:104. PubMed PMC
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protocols 7:562–578 PubMed PMC
Vaser R, Sovic I, Nagarajan N, Sikic M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27:737–746 PubMed PMC
Walkowiak S, Gao L, Monat C, Haberer G, Kassa MT, Brinton J, Ramirez-Gonzalez RH, Kolodziej MC, Delorean E, Thambugala D, et al. (2020) Multiple wheat genomes reveal global variation in modern breeding. Nature PubMed PMC
Watson M, Warr A (2019) Errors in long-read assemblies can critically affect protein prediction. Nat Biotechnol 37:124–126 PubMed
Wei F, Wing RA, Wise RP (2002) Genome dynamics and evolution of the Mla (powdery mildew) resistance locus in barley. Plant Cell 14:1903–1917 PubMed PMC
Wendler N, Mascher M, Himmelbach A, Bini F, Kumlehn J, Stein N (2017) A high-density, sequence-enriched genetic map of hordeum bulbosum and its collinearity to H. vulgare. Plant Genome 10 PubMed
Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, et al. (2019) Highly-accurate long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol 37:1155–1162 PubMed PMC
Wicker T, Gundlach H, Schulman AH (2018) The Barley Genome. Springer, Germany, pp. 123–138
Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al. (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982 PubMed
Wicker T, Yu Y, Haberer G, Mayer KFX, Marri PR, Rounsley S, Chen M, Zuccolo A, Panaud O, Wing RA, et al. (2016) DNA transposon activity is associated with increased mutation rates in genes of rice and other grasses. Nat Commun 7:12790. PubMed PMC
Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859–1875 PubMed
Xiao CL, Chen Y, Xie SQ, Chen KN, Wang Y, Han Y, Luo F, Xie Z (2017) MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat Methods 14:1072–1074 PubMed
Xu M, Guo L, Gu S, Wang O, Zhang R, Fan G, Xu X, Deng L, Liu X (2019) TGS-GapCloser: fast and accurately passing through the Bermuda in large genome using error-prone third-generation long reads. bioRxiv (doi: 10.1101/831248)
Yuan Y, Bayer PE, Anderson R, Lee H, Chan C-KK, Zhao R, Batley J, Edwards D (2020) RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes (bioRxiv: 2020.2004.2017.035287)
Zeng X, Xu T, Ling Z, Wang Y, Li X, Xu S, Xu Q, Zha S, Qimei W, Basang Y, et al. (2020) An improved high-quality genome assembly and annotation of Tibetan hulless barley. Sci Data 7:139. PubMed PMC
Zimin AV, Puiu D, Hall R, Kingan S, Clavijo BJ, Salzberg SL (2017) The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum. Gigascience 6:1–7 PubMed PMC
Zimin AV, Puiu D, Luo MC, Zhu T, Koren S, Marcais G, Yorke JA, Dvorak J, Salzberg SL (2017) Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res 27:787–792 PubMed PMC
A barley pan-transcriptome reveals layers of genotype-dependent transcriptional complexity
Core promoterome of barley embryo
Transcriptional changes during crown-root development and emergence in barley (Hordeum vulgare L.)
Zeocin-induced DNA damage response in barley and its dependence on ATR
Barley MLA3 recognizes the host-specificity effector Pwl2 from Magnaporthe oryzae
A pathogen-induced putative NAC transcription factor mediates leaf rust resistance in barley
Flow Sorting-Assisted Optical Mapping
A lineage-specific Exo70 is required for receptor kinase-mediated immunity in barley