• Je něco špatně v tomto záznamu ?

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

P. Novák, P. Neumann, J. Macas,

. 2010 ; 11 () : 378. [pub] 20100715

Jazyk angličtina Země Anglie, Velká Británie

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/bmc12026244

BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc12026244
003      
CZ-PrNML
005      
20121206121332.0
007      
ta
008      
120817e20100715enk f 000 0#eng||
009      
AR
024    7_
$a 10.1186/1471-2105-11-378 $2 doi
035    __
$a (PubMed)20633259
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Novák, Petr $u Biology Centre ASCR, Institute of Plant Molecular Biology, Branisovska 31, Ceske Budejovice, CZ-37005, Czech Republic.
245    10
$a Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data / $c P. Novák, P. Neumann, J. Macas,
520    9_
$a BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.
650    _2
$a mapování chromozomů $7 D002874
650    _2
$a shluková analýza $7 D016000
650    _2
$a DNA rostlinná $x genetika $7 D018744
650    _2
$a genom rostlinný $7 D018745
650    _2
$a hrách setý $x genetika $7 D018532
650    _2
$a repetitivní sekvence nukleových kyselin $7 D012091
650    _2
$a sekvenční analýza DNA $7 D017422
650    _2
$a Glycine max $x genetika $7 D013025
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Neumann, Pavel
700    1_
$a Macas, Jirí
773    0_
$w MED00008167 $t BMC Bioinformatics $x 1471-2105 $g Roč. 11(20100715), s. 378
856    41
$u https://pubmed.ncbi.nlm.nih.gov/20633259 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y m
990    __
$a 20120817 $b ABA008
991    __
$a 20121206121406 $b ABA008
999    __
$a ok $b bmc $g 948286 $s 783590
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2010 $b 11 $d 378 $e 20100715 $i 1471-2105 $m BMC bioinformatics $n BMC Bioinformatics $x MED00008167
LZP    __
$a Pubmed-20120817/10/04

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...