• Je něco špatně v tomto záznamu ?

Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods

G. Martin, FC. Baurens, G. Droc, M. Rouard, A. Cenci, A. Kilian, A. Hastie, J. Doležel, JM. Aury, A. Alberti, F. Carreel, A. D'Hont,

. 2016 ; 17 (-) : 243. [pub] 20160316

Jazyk angličtina Země Anglie, Velká Británie

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/bmc17000340

BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc17000340
003      
CZ-PrNML
005      
20170112123131.0
007      
ta
008      
170103s2016 enk f 000 0|eng||
009      
AR
024    7_
$a 10.1186/s12864-016-2579-4 $2 doi
024    7_
$a 10.1186/s12864-016-2579-4 $2 doi
035    __
$a (PubMed)26984673
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Martin, Guillaume $u CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
245    10
$a Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods / $c G. Martin, FC. Baurens, G. Droc, M. Rouard, A. Cenci, A. Kilian, A. Hastie, J. Doležel, JM. Aury, A. Alberti, F. Carreel, A. D'Hont,
520    9_
$a BACKGROUND: Recent advances in genomics indicate functional significance of a majority of genome sequences and their long range interactions. As a detailed examination of genome organization and function requires very high quality genome sequence, the objective of this study was to improve reference genome assembly of banana (Musa acuminata). RESULTS: We have developed a modular bioinformatics pipeline to improve genome sequence assemblies, which can handle various types of data. The pipeline comprises several semi-automated tools. However, unlike classical automated tools that are based on global parameters, the semi-automated tools proposed an expert mode for a user who can decide on suggested improvements through local compromises. The pipeline was used to improve the draft genome sequence of Musa acuminata. Genotyping by sequencing (GBS) of a segregating population and paired-end sequencing were used to detect and correct scaffold misassemblies. Long insert size paired-end reads identified scaffold junctions and fusions missed by automated assembly methods. GBS markers were used to anchor scaffolds to pseudo-molecules with a new bioinformatics approach that avoids the tedious step of marker ordering during genetic map construction. Furthermore, a genome map was constructed and used to assemble scaffolds into super scaffolds. Finally, a consensus gene annotation was projected on the new assembly from two pre-existing annotations. This approach reduced the total Musa scaffold number from 7513 to 1532 (i.e. by 80%), with an N50 that increased from 1.3 Mb (65 scaffolds) to 3.0 Mb (26 scaffolds). 89.5% of the assembly was anchored to the 11 Musa chromosomes compared to the previous 70%. Unknown sites (N) were reduced from 17.3 to 10.0%. CONCLUSION: The release of the Musa acuminata reference genome version 2 provides a platform for detailed analysis of banana genome variation, function and evolution. Bioinformatics tools developed in this work can be used to improve genome sequence assemblies in other species.
650    _2
$a výpočetní biologie $x metody $7 D019295
650    _2
$a kontigové mapování $7 D020451
650    _2
$a genetické markery $7 D005819
650    12
$a genom rostlinný $7 D018745
650    _2
$a vysoce účinné nukleotidové sekvenování $7 D059014
650    _2
$a anotace sekvence $7 D058977
650    _2
$a banánovník $x genetika $7 D028521
650    _2
$a sekvenční analýza DNA $7 D017422
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Baurens, Franc-Christophe $u CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
700    1_
$a Droc, Gaëtan $u CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
700    1_
$a Rouard, Mathieu $u Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France.
700    1_
$a Cenci, Alberto $u Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, Cedex 5, France.
700    1_
$a Kilian, Andrzej $u Diversity Arrays Technology, Yarralumla, Australian Capital Territory, 2600, Australia.
700    1_
$a Hastie, Alex $u BioNano Genomics, 9640 Towne Centre Drive, San Diego, CA, 92121, USA.
700    1_
$a Doležel, Jaroslav $u Institute of Experimental Botany, Centre of the Region Hana for Biotechnological and Agricultural Research, Šlechtitelů 31, CZ-78371, Olomouc, Czech Republic.
700    1_
$a Aury, Jean-Marc $u Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France. $7 gn_A_00010156
700    1_
$a Alberti, Adriana $u Commissariat à l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Cremieux, BP5706, 91057, Evry, France. $7 gn_A_00003471
700    1_
$a Carreel, Françoise $u CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France.
700    1_
$a D'Hont, Angélique $u CIRAD (Centre de coopération Internationale en Recherche Agronomique pour le Développement), UMR AGAP, TA A-108/03, Avenue Agropolis, F-34398, Montpellier, cedex 5, France. dhont@cirad.fr.
773    0_
$w MED00008181 $t BMC genomics $x 1471-2164 $g Roč. 17, č. - (2016), s. 243
856    41
$u https://pubmed.ncbi.nlm.nih.gov/26984673 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20170103 $b ABA008
991    __
$a 20170112123230 $b ABA008
999    __
$a ok $b bmc $g 1179480 $s 960907
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2016 $b 17 $c - $d 243 $e 20160316 $i 1471-2164 $m BMC genomics $n BMC Genomics $x MED00008181
LZP    __
$a Pubmed-20170103

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...