Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods
Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
26984673
PubMed Central
PMC4793746
DOI
10.1186/s12864-016-2579-4
PII: 10.1186/s12864-016-2579-4
Knihovny.cz E-zdroje
- Klíčová slova
- Bioinformatics tool, GBS, Genome assembly, Genome map, Musa acuminata, Paired-end sequences,
- MeSH
- anotace sekvence MeSH
- banánovník genetika MeSH
- genetické markery MeSH
- genom rostlinný * MeSH
- kontigové mapování MeSH
- sekvenční analýza DNA MeSH
- výpočetní biologie metody MeSH
- vysoce účinné nukleotidové sekvenování MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- genetické markery MeSH
BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
BioNano Genomics 9640 Towne Centre Drive San Diego CA 92121 USA
Bioversity International Parc Scientifique Agropolis 2 34397 Montpellier Cedex 5 France
CIRAD UMR AGAP TA A 108 03 Avenue Agropolis F 34398 Montpellier cedex 5 France
Commissariat à l'Energie Atomique Genoscope 2 rue Gaston Cremieux BP5706 91057 Evry France
Diversity Arrays Technology Yarralumla Australian Capital Territory 2600 Australia
Zobrazit více v PubMed
Bolger ME, Weisshaar B, Scholz U, Stein N, Usadel B, Mayer KF. Plant genome sequencing — applications for crop improvement. Curr Opin Biotechnol. 2014;26:31–37. doi: 10.1016/j.copbio.2013.08.019. PubMed DOI
Feuillet C, Leach JE, Rogers J, Schnable PS, Eversole K. Crop genome sequencing: lessons and rationales. Trends Plant Sci. 2011;16:77–88. doi: 10.1016/j.tplants.2010.10.005. PubMed DOI
Michael TP, Jackson S. The First 50 Plant Genomes. Plant Genome. 2013;6:1–7. doi: 10.3835/plantgenome2013.03.0001in. DOI
Kejnovsky E, Hawkins J, Feschotte C. Plant Transposable Elements: Biology and Evolution. In: Wendel JF, Greilhuber J, Dolezel J, Leitch IJ, editors. Plant Genome Diversity. Vienna: Springer; 2012. pp. 17–34.
Hahn MW, Zhang SV, Moyle LC. Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations. G3 Genes Genomes Genetics. 2014;4:669–79. PubMed PMC
Vanneste K, Maere S, Van de Peer Y. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Philos Trans R Soc B Biol Sci. 2014;369:1–13. doi: 10.1098/rstb.2013.0353. PubMed DOI PMC
Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2011;8:61–65. doi: 10.1038/nmeth.1527. PubMed DOI PMC
Mardis ER. A decade’s perspective on DNA sequencing technology. Nature. 2011;470:198–203. doi: 10.1038/nature09796. PubMed DOI
Williams LJS, Tabbaa DG, Li N, Berlin AM, Shea TP, MacCallum I, Lawrence MS, Drier Y, Getz G, Young SK, Jaffe DB, Nusbaum C, Gnirke A. Paired-end sequencing of Fosmid libraries by Illumina. Genome Res. 2012;22:2241–2249. doi: 10.1101/gr.138925.112. PubMed DOI PMC
Dong Y, Xie M, Jiang Y, Xiao N, Du X, Zhang W, Tosser-Klopp G, Wang J, Yang S, Liang J, Chen W, Chen J, Zeng P, Hou Y, Bian C, Pan S, Li Y, Liu X, Wang W, Servin B, Sayre B, Zhu B, Sweeney D, Moore R, Nie W, Shen Y, Zhao R, Zhang G, Li J, Faraut T, et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus) Nat Biotechnol. 2013;31:135–141. doi: 10.1038/nbt.2478. PubMed DOI
Levy-Sakin M, Ebenstein Y. Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy. Curr Opin Biotechnol. 2013;24:690–698. doi: 10.1016/j.copbio.2013.01.009. PubMed DOI
Neely RK, Deen J, Hofkens J. Optical mapping of DNA: Single-molecule-based methods for mapping genomes. Biopolymers. 2011;95:298–311. doi: 10.1002/bip.21579. PubMed DOI
Mascher M, Stein N. Genetic anchoring of whole-genome shotgun assemblies. Front Genet. 2014;5:1–7. doi: 10.3389/fgene.2014.00208. PubMed DOI PMC
Mascher M, Muehlbauer GJ, Rokhsar DS, Chapman J, Schmutz J, Barry K, Muñoz-Amatriaín M, Close TJ, Wise RP, Schulman AH, Himmelbach A, Mayer KFX, Scholz U, Poland JA, Stein N, Waugh R. Anchoring and ordering NGS contig assemblies by population sequencing (POPSEQ) Plant J. 2013;76:718–727. doi: 10.1111/tpj.12319. PubMed DOI PMC
Schatz M, Witkowski J, McCombie WR. Current challenges in de novo plant genome sequencing and assembly. Genome Biol. 2012;13:243. doi: 10.1186/gb-2012-13-4-243. PubMed DOI PMC
Pop M, Kosack DS, Salzberg SL. Hierarchical Scaffolding With Bambus. Genome Res. 2004;14:149–159. doi: 10.1101/gr.1536204. PubMed DOI PMC
Dayarian A, Michael T, Sengupta A. SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics. 2010;11:345. doi: 10.1186/1471-2105-11-345. PubMed DOI PMC
Salmela L, Mäkinen V, Välimäki N, Ylinen J, Ukkonen E. Fast scaffolding with small independent mixed integer programs. Bioinformatics. 2011;27:3259–3265. doi: 10.1093/bioinformatics/btr562. PubMed DOI PMC
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27:578–579. doi: 10.1093/bioinformatics/btq683. PubMed DOI
Gao S, Sung W-K, Nagarajan N. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol. 2011;18:1681–1691. doi: 10.1089/cmb.2011.0170. PubMed DOI PMC
Gritsenko AA, Nijkamp JF, Reinders MJT, de Ridder D. GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies. Bioinformatics. 2012;28:1429–1437. doi: 10.1093/bioinformatics/bts175. PubMed DOI
Donmez N, Brudno M. SCARPA: scaffolding reads with practical algorithms. Bioinformatics. 2013;29:428–434. doi: 10.1093/bioinformatics/bts716. PubMed DOI
Boetzer M, Pirovano W. SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information. BMC Bioinformatics. 2014;15:211. doi: 10.1186/1471-2105-15-211. PubMed DOI PMC
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung D, Yiu S-M, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam T-W, Wang J. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1:18. doi: 10.1186/2047-217X-1-18. PubMed DOI PMC
Boetzer M, Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol. 2012;13:R56. doi: 10.1186/gb-2012-13-6-r56. PubMed DOI PMC
Swain MT, Tsai IJ, Assefa SA, Newbold C, Berriman M, Otto TD. A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nat Protoc. 2012;7:1260–1284. doi: 10.1038/nprot.2012.068. PubMed DOI PMC
D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O, Noel B, Bocs S, Droc G, Rouard M, Da Silva C, Jabbari K, Cardi C, Poulain J, Souquet M, Labadie K, Jourda C, Lengelle J, Rodier-Goud M, Alberti A, Bernard M, Correa M, Ayyampalayam S, Mckain MR, Leebens-Mack J, Burgess D, Freeling M, Mbeguie-A-Mbeguie D, Chabannes M, Wicker T, et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–217. doi: 10.1038/nature11241. PubMed DOI
Jourda C, Cardi C, Mbéguié-A-Mbéguié D, Bocs S, Garsmeur O, D’Hont A, Yahiaoui N. Expansion of banana (Musa acuminata) gene families involved in ethylene biosynthesis and signalling after lineage-specific whole-genome duplications. New Phytol. 2014;202:986–1000. doi: 10.1111/nph.12710. PubMed DOI
Garsmeur O, Schnable JC, Almeida A, Jourda C, D’Hont A, Freeling M. Two Evolutionarily Distinct Classes of Paleopolyploidy. Mol Biol Evol. 2014;31:448–454. doi: 10.1093/molbev/mst230. PubMed DOI
Cenci A, Guignon V, Roux N, Rouard M. Genomic analysis of NAC transcription factors in banana (Musa acuminata) and definition of NAC orthologous groups for monocots and dicots. Plant Mol Biol. 2014;85:63–80. doi: 10.1007/s11103-013-0169-2. PubMed DOI PMC
Chen J, Hu Q, Zhang Y, Lu C, Kuang H. P-MITE: a database for plant miniature inverted-repeat transposable elements. Nucleic Acids Res. 2014;42:D1176–D1181. doi: 10.1093/nar/gkt1000. PubMed DOI PMC
Golicz AA, Schliep M, Lee HT, Larkum AWD, Dolferus R, Batley J, Chan C-KK, Sablok G, Ralph PJ, Edwards D. Genome-wide survey of the seagrass Zostera muelleri suggests modification of the ethylene signalling network. J Exp Bot. 2015;66:1489–98. doi: 10.1093/jxb/eru510. PubMed DOI PMC
Sampedro J, Guttman M, Li L-C, Cosgrove DJ. Evolutionary divergence of β–expansin structure and function in grasses parallels emergence of distinctive primary cell wall traits. Plant J. 2015;81:108–120. doi: 10.1111/tpj.12715. PubMed DOI
De Smet R, Adams KL, Vandepoele K, Van Montagu MCE, Maere S, Van de Peer Y. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants. Proc Natl Acad Sci. 2013;110:2898–2903. doi: 10.1073/pnas.1300127110. PubMed DOI PMC
Chain PSG, Grafham DV, Fulton RS, FitzGerald MG, Hostetler J, Muzny D, Ali J, Birren B, Bruce DC, Buhay C, Cole JR, Ding Y, Dugan S, Field D, Garrity GM, Gibbs R, Graves T, Han CS, Harrison SH, Highlander S, Hugenholtz P, Khouri HM, Kodira CD, Kolker E, Kyrpides NC, Lang D, Lapidus A, Malfatti SA, Markowitz V, Metha T, et al. Genome Project Standards in a New Era of Sequencing. Science. 2009;326:236–237. doi: 10.1126/science.1180614. PubMed DOI PMC
Šimková H, Číhalíková J, Vrána J, Lysák M, Doležel J. Preparation of HMW DNA from Plant Nuclei and Chromosomes Isolated from Root Tips. Biol Plant. 2003;46:369–373. doi: 10.1023/A:1024322001786. DOI
Cruz VM. Molecular Genetic Characterization of Lesquerella New Industrial Crop Using DArTseq Markers. In Plant and Animal Genome XXI Conference, San Diego, CA, USA. Plant and Animal Genome. 2013.
Li H, Durbin R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. PubMed DOI PMC
Van Ooijen JW. Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. Genet Res. 2011;93:343–349. doi: 10.1017/S0016672311000279. PubMed DOI
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. PubMed DOI PMC
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. PubMed DOI
Krzywinski M, Schein J, Birol İ, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: An information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. PubMed DOI PMC
Anantharaman T, Mishra B. Proc. of WABI. 2001. A Probabilistic Analysis of False Positives in Optical Map Alignment and Validation; pp. 27–40.
Nguyen JV. Genomic Mapping: A Statistical and Algorithmic Analysis of the Optical Mapping System. Los Angeles, CA, USA: University of Southern California; 2010.
Pendleton M, Sebra R, Pang AWC, Ummat A, Franzen O, Rausch T, Stütz AM, Stedman W, Anantharaman T, Hastie A, Dai H, Fritz MH-Y, Cao H, Cohain A, Deikus G, Durrett RE, Blanchard SC, Altman R, Chin C-S, Guo Y, Paxinos EE, Korbel JO, Darnell RB, McCombie WR, Kwok P-Y, Mason CE, Schadt EE, Bashir A. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods. 2015;12:780–786. doi: 10.1038/nmeth.3454. PubMed DOI PMC
Slater G, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. doi: 10.1186/1471-2105-6-31. PubMed DOI PMC
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. PubMed DOI PMC
Muggli MD, Puglisi SJ, Ronen R, Boucher C. Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics. 2015;31:80–88. doi: 10.1093/bioinformatics/btv262. PubMed DOI PMC
Chen M, Presting G, Barbazuk WB, Goicoechea JL, Blackmon B, Fang G, Kim H, Frisch D, Yu Y, Sun S, Higingbottom S, Phimphilai J, Phimphilai D, Thurmond S, Gaudette B, Li P, Liu J, Hatfield J, Main D, Farrar K, Henderson C, Barnett L, Costa R, Williams B, Walser S, Atkins M, Hall C, Budiman MA, Tomkins JP, Luo M, et al. An Integrated Physical and Genetic Map of the Rice Genome. Plant Cell Online. 2002;14:537–545. doi: 10.1105/tpc.010485. PubMed DOI PMC
Gill KS, Gill BS, Endo TR, Taylor T. Identification and high-density mapping of gene-rich regions in chromosome group 1 of wheat. Genetics. 1996;144:1883–1891. PubMed PMC
Hall SE, Kettler G, Preuss D. Centromere Satellites From Arabidopsis Populations: Maintenance of Conserved and Variable Domains. Genome Res. 2003;13:195–205. doi: 10.1101/gr.593403. PubMed DOI PMC
Wu J, Mizuno H, Hayashi-Tsugane M, Ito Y, Chiden Y, Fujisawa M, Katagiri S, Saji S, Yoshiki S, Karasawa W, Yoshihara R, Hayashi A, Kobayashi H, Ito K, Hamada M, Okamoto M, Ikeno M, Ichikawa Y, Katayose Y, Yano M, Matsumoto T, Sasaki T. Physical maps and recombination frequency of six rice chromosomes. Plant J. 2003;36:720–730. doi: 10.1046/j.1365-313X.2003.01903.x. PubMed DOI
Droc G, Larivière D, Guignon V, Yahiaoui N, This D, Garsmeur O, Dereeper A, Hamelin C, Argout X, Dufayard J-F, Lengelle J, Baurens F-C, Cenci A, Pitollat B, D’Hont A, Ruiz M, Rouard M, Bocs S. The Banana Genome Hub. Database. 2013;2013:1–14. doi: 10.1093/database/bat035. PubMed DOI PMC
Advances in the Molecular Cytogenetics of Bananas, Family Musaceae
Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing
Trait variation and genetic diversity in a banana genomic selection training population