-
Something wrong with this record ?
Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data
P. Novák, P. Neumann, J. Macas,
Language English Country England, Great Britain
Document type Journal Article, Research Support, Non-U.S. Gov't
NLK
BioMedCentral
from 2000-12-01
BioMedCentral Open Access
from 2000
Directory of Open Access Journals
from 2000
Free Medical Journals
from 2000
PubMed Central
from 2000
Europe PubMed Central
from 2000
ProQuest Central
from 2009-01-01
Open Access Digital Library
from 2000-01-01
Open Access Digital Library
from 2000-07-01
Open Access Digital Library
from 2000-01-01
Medline Complete (EBSCOhost)
from 2000-01-01
Health & Medicine (ProQuest)
from 2009-01-01
ROAD: Directory of Open Access Scholarly Resources
from 2000
Springer Nature OA/Free Journals
from 2000-12-01
- MeSH
- DNA, Plant genetics MeSH
- Genome, Plant MeSH
- Glycine max genetics MeSH
- Pisum sativum genetics MeSH
- Chromosome Mapping MeSH
- Repetitive Sequences, Nucleic Acid MeSH
- Sequence Analysis, DNA MeSH
- Cluster Analysis MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc12026244
- 003
- CZ-PrNML
- 005
- 20121206121332.0
- 007
- ta
- 008
- 120817e20100715enk f 000 0#eng||
- 009
- AR
- 024 7_
- $a 10.1186/1471-2105-11-378 $2 doi
- 035 __
- $a (PubMed)20633259
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a enk
- 100 1_
- $a Novák, Petr $u Biology Centre ASCR, Institute of Plant Molecular Biology, Branisovska 31, Ceske Budejovice, CZ-37005, Czech Republic.
- 245 10
- $a Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data / $c P. Novák, P. Neumann, J. Macas,
- 520 9_
- $a BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.
- 650 _2
- $a mapování chromozomů $7 D002874
- 650 _2
- $a shluková analýza $7 D016000
- 650 _2
- $a DNA rostlinná $x genetika $7 D018744
- 650 _2
- $a genom rostlinný $7 D018745
- 650 _2
- $a hrách setý $x genetika $7 D018532
- 650 _2
- $a repetitivní sekvence nukleových kyselin $7 D012091
- 650 _2
- $a sekvenční analýza DNA $7 D017422
- 650 _2
- $a Glycine max $x genetika $7 D013025
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 700 1_
- $a Neumann, Pavel
- 700 1_
- $a Macas, Jirí
- 773 0_
- $w MED00008167 $t BMC Bioinformatics $x 1471-2105 $g Roč. 11(20100715), s. 378
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/20633259 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y m
- 990 __
- $a 20120817 $b ABA008
- 991 __
- $a 20121206121406 $b ABA008
- 999 __
- $a ok $b bmc $g 948286 $s 783590
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2010 $b 11 $d 378 $e 20100715 $i 1471-2105 $m BMC bioinformatics $n BMC Bioinformatics $x MED00008167
- LZP __
- $a Pubmed-20120817/10/04