JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Článek
Článek online

FT
Medvik - BMČ

Je něco špatně v tomto záznamu ?

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

P. Novák, P. Neumann, J. Macas,

BMC bioinformatics. 2010 ; 11 () : 378. [pub] 20100715

BMC Bioinformatics
ISSN 1471-2105
Medvik
Zdroj

Jazyk angličtina Země Anglie, Velká Británie

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz https://www.medvik.cz/link/bmc12026244

Online Plný text

NLK BioMedCentral od 2000-01-12
BioMedCentral Open Access od 2000
Directory of Open Access Journals od 2000
Free Medical Journals od 2000
PubMed Central od 2000
Europe PubMed Central od 2000
ProQuest Central od 2009-01-01
Open Access Digital Library od 2000-01-01
Open Access Digital Library od 2000-01-01
Open Access Digital Library od 2000-07-01
Medline Complete (EBSCOhost) od 2000-01-01
Health & Medicine (ProQuest) od 2009-01-01
ROAD: Directory of Open Access Scholarly Resources od 2000
Springer Nature OA/Free Journals od 2000-12-01

PubMed 20633259
DOI 10.1186/1471-2105-11-378
Knihovny.cz E-zdroje

MeSH
DNA rostlinná genetika MeSH
genom rostlinný MeSH
Glycine max genetika MeSH
hrách setý genetika MeSH
mapování chromozomů MeSH
repetitivní sekvence nukleových kyselin MeSH
sekvenční analýza DNA MeSH
shluková analýza MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH

BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

Biology Centre ASCR Institute of Plant Molecular Biology Branisovska 31 Ceske Budejovice CZ 37005 Czech Republic

Citace poskytuje Crossref.org

000: 00000naa a2200000 a 4500

001: bmc12026244

003: CZ-PrNML

005: 20121206121332.0

007: ta

008: 120817e20100715enk f 000 0#eng||

009: AR

024 7_: $a 10.1186/1471-2105-11-378 $2 doi

035 __: $a (PubMed)20633259

040 __: $a ABA008 $b cze $d ABA008 $e AACR2

041 0_: $a eng

044 __: $a enk

100 1_: $a Novák, Petr $u Biology Centre ASCR, Institute of Plant Molecular Biology, Branisovska 31, Ceske Budejovice, CZ-37005, Czech Republic.

245 10: $a Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data / $c P. Novák, P. Neumann, J. Macas,

520 9_: $a BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

650 _2: $a mapování chromozomů $7 D002874

650 _2: $a shluková analýza $7 D016000

650 _2: $a DNA rostlinná $x genetika $7 D018744

650 _2: $a genom rostlinný $7 D018745

650 _2: $a hrách setý $x genetika $7 D018532

650 _2: $a repetitivní sekvence nukleových kyselin $7 D012091

650 _2: $a sekvenční analýza DNA $7 D017422

650 _2: $a Glycine max $x genetika $7 D013025

655 _2: $a časopisecké články $7 D016428

655 _2: $a práce podpořená grantem $7 D013485

700 1_: $a Neumann, Pavel

700 1_: $a Macas, Jirí

773 0_: $w MED00008167 $t BMC Bioinformatics $x 1471-2105 $g Roč. 11(20100715), s. 378

856 41: $u https://pubmed.ncbi.nlm.nih.gov/20633259 $y Pubmed

910 __: $a ABA008 $b sig $c sign $y m

990 __: $a 20120817 $b ABA008

991 __: $a 20121206121406 $b ABA008

999 __: $a ok $b bmc $g 948286 $s 783590

BAS __: $a 3

BAS __: $a PreBMC

BMC __: $a 2010 $b 11 $d 378 $e 20100715 $i 1471-2105 $m BMC bioinformatics $n BMC Bioinformatics $x MED00008167

LZP __: $a Pubmed-20120817/10/04

Najít záznam

v PubMed

Citační ukazatele

Pouze přihlášení uživatelé

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

Najít záznam

Citační ukazatele

Možnosti archivace