• Je něco špatně v tomto záznamu ?

TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads

P. Novák, L. Ávila Robledillo, A. Koblížková, I. Vrbová, P. Neumann, J. Macas,

. 2017 ; 45 (12) : e111.

Jazyk angličtina Země Velká Británie

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/bmc18016665

Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc18016665
003      
CZ-PrNML
005      
20180515103609.0
007      
ta
008      
180515s2017 xxk f 000 0|eng||
009      
AR
024    7_
$a 10.1093/nar/gkx257 $2 doi
035    __
$a (PubMed)28402514
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxk
100    1_
$a Novák, Petr $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
245    10
$a TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads / $c P. Novák, L. Ávila Robledillo, A. Koblížková, I. Vrbová, P. Neumann, J. Macas,
520    9_
$a Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
650    _2
$a sekvence nukleotidů $7 D001483
650    _2
$a mapování chromozomů $x metody $7 D002874
650    _2
$a shluková analýza $7 D016000
650    _2
$a počítačová grafika $7 D003196
650    _2
$a konsenzuální sekvence $7 D016384
650    _2
$a šáchorovité $x genetika $7 D029785
650    _2
$a DNA rostlinná $x genetika $7 D018744
650    _2
$a satelitní DNA $x klasifikace $x genetika $7 D004276
650    12
$a genom rostlinný $7 D018745
650    _2
$a hybridizace in situ fluorescenční $7 D017404
650    _2
$a Magnoliopsida $x genetika $7 D019684
650    _2
$a metafáze $7 D008677
650    _2
$a hrách setý $x genetika $7 D018532
650    _2
$a sekvenční analýza DNA $7 D017422
650    12
$a software $7 D012984
650    _2
$a Vicia faba $x genetika $7 D031307
650    _2
$a kukuřice setá $x genetika $7 D003313
655    _2
$a časopisecké články $7 D016428
700    1_
$a Ávila Robledillo, Laura $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
700    1_
$a Koblížková, Andrea $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
700    1_
$a Vrbová, Iva $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
700    1_
$a Neumann, Pavel $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
700    1_
$a Macas, Jirí $u Institute of Plant Molecular Biology, Biology Centre CAS, Ceské Budejovice CZ-37005, Czech Republic.
773    0_
$w MED00003554 $t Nucleic acids research $x 1362-4962 $g Roč. 45, č. 12 (2017), s. e111
856    41
$u https://pubmed.ncbi.nlm.nih.gov/28402514 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20180515 $b ABA008
991    __
$a 20180515103743 $b ABA008
999    __
$a ok $b bmc $g 1300289 $s 1013505
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2017 $b 45 $c 12 $d e111 $i 1362-4962 $m Nucleic acids research $n Nucleic Acids Res $x MED00003554
LZP    __
$a Pubmed-20180515

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...