-
Something wrong with this record ?
TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads
P. Novák, L. Ávila Robledillo, A. Koblížková, I. Vrbová, P. Neumann, J. Macas,
Language English Country Great Britain
Document type Journal Article
NLK
Directory of Open Access Journals
from 2005
Free Medical Journals
from 1996
PubMed Central
from 1974
Europe PubMed Central
from 1974
Open Access Digital Library
from 1996-01-01 to 2030-12-31
Open Access Digital Library
from 1974-01-01
Open Access Digital Library
from 1996-01-01
Open Access Digital Library
from 1996-01-01
Medline Complete (EBSCOhost)
from 1996-01-01
Oxford Journals Open Access Collection
from 1996-01-01
ROAD: Directory of Open Access Scholarly Resources
from 1974
PubMed
28402514
DOI
10.1093/nar/gkx257
Knihovny.cz E-resources
- MeSH
- DNA, Plant genetics MeSH
- Genome, Plant * MeSH
- Pisum sativum genetics MeSH
- In Situ Hybridization, Fluorescence MeSH
- Consensus Sequence MeSH
- Zea mays genetics MeSH
- Magnoliopsida genetics MeSH
- Chromosome Mapping methods MeSH
- Metaphase MeSH
- Computer Graphics MeSH
- Cyperaceae genetics MeSH
- DNA, Satellite classification genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Cluster Analysis MeSH
- Software * MeSH
- Vicia faba genetics MeSH
- Publication type
- Journal Article MeSH
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc18016665
- 003
- CZ-PrNML
- 005
- 20180515103609.0
- 007
- ta
- 008
- 180515s2017 xxk f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1093/nar/gkx257 $2 doi
- 035 __
- $a (PubMed)28402514
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxk
- 100 1_
- $a Novák, Petr $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
- 245 10
- $a TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads / $c P. Novák, L. Ávila Robledillo, A. Koblížková, I. Vrbová, P. Neumann, J. Macas,
- 520 9_
- $a Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
- 650 _2
- $a sekvence nukleotidů $7 D001483
- 650 _2
- $a mapování chromozomů $x metody $7 D002874
- 650 _2
- $a shluková analýza $7 D016000
- 650 _2
- $a počítačová grafika $7 D003196
- 650 _2
- $a konsenzuální sekvence $7 D016384
- 650 _2
- $a šáchorovité $x genetika $7 D029785
- 650 _2
- $a DNA rostlinná $x genetika $7 D018744
- 650 _2
- $a satelitní DNA $x klasifikace $x genetika $7 D004276
- 650 12
- $a genom rostlinný $7 D018745
- 650 _2
- $a hybridizace in situ fluorescenční $7 D017404
- 650 _2
- $a Magnoliopsida $x genetika $7 D019684
- 650 _2
- $a metafáze $7 D008677
- 650 _2
- $a hrách setý $x genetika $7 D018532
- 650 _2
- $a sekvenční analýza DNA $7 D017422
- 650 12
- $a software $7 D012984
- 650 _2
- $a Vicia faba $x genetika $7 D031307
- 650 _2
- $a kukuřice setá $x genetika $7 D003313
- 655 _2
- $a časopisecké články $7 D016428
- 700 1_
- $a Ávila Robledillo, Laura $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
- 700 1_
- $a Koblížková, Andrea $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
- 700 1_
- $a Vrbová, Iva $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
- 700 1_
- $a Neumann, Pavel $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
- 700 1_
- $a Macas, Jirí $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
- 773 0_
- $w MED00003554 $t Nucleic acids research $x 1362-4962 $g Roč. 45, č. 12 (2017), s. e111
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/28402514 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20180515 $b ABA008
- 991 __
- $a 20180515103743 $b ABA008
- 999 __
- $a ok $b bmc $g 1300289 $s 1013505
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2017 $b 45 $c 12 $d e111 $i 1362-4962 $m Nucleic acids research $n Nucleic Acids Res $x MED00003554
- LZP __
- $a Pubmed-20180515