• Something wrong with this record ?

Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data

J. Macas, P. Neumann, P. Novák, J. Jiang,

. 2010 ; 26 (17) : 2101-8. [pub] 20100708

Language English Country England, Great Britain

Document type Journal Article, Research Support, Non-U.S. Gov't, Research Support, U.S. Gov't, Non-P.H.S.

MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc12026265
003      
CZ-PrNML
005      
20121206114803.0
007      
ta
008      
120817e20100708enk f 000 0#eng||
009      
AR
024    7_
$a 10.1093/bioinformatics/btq343 $2 doi
035    __
$a (PubMed)20616383
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Macas, Jirí $u Institute of Plant Molecular Biology, Biology Centre ASCR, Branisovska 31, CZ-37005, Ceske Budejovice, Czech Republic. macas@umbr.cas.cz
245    10
$a Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data / $c J. Macas, P. Neumann, P. Novák, J. Jiang,
520    9_
$a MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.
650    _2
$a sekvence nukleotidů $7 D001483
650    _2
$a centromera $x genetika $7 D002503
650    _2
$a chromozomy rostlin $x genetika $7 D032461
650    _2
$a konzervovaná sekvence $x genetika $7 D017124
650    _2
$a DNA rostlinná $x genetika $7 D018744
650    _2
$a satelitní DNA $x genetika $7 D004276
650    _2
$a molekulární sekvence - údaje $7 D008969
650    _2
$a rýže (rod) $x genetika $7 D012275
650    _2
$a sekvenční analýza DNA $x metody $7 D017422
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
655    _2
$a Research Support, U.S. Gov't, Non-P.H.S. $7 D013486
700    1_
$a Neumann, Pavel
700    1_
$a Novák, Petr
700    1_
$a Jiang, Jiming
773    0_
$w MED00008115 $t Bioinformatics (Oxford, England) $x 1367-4811 $g Roč. 26, č. 17 (20100708), s. 2101-8
856    41
$u https://pubmed.ncbi.nlm.nih.gov/20616383 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y m
990    __
$a 20120817 $b ABA008
991    __
$a 20121206114836 $b ABA008
999    __
$a ok $b bmc $g 948307 $s 783611
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2010 $b 26 $c 17 $d 2101-8 $e 20100708 $i 1367-4811 $m Bioinformatics $n Bioinformatics $x MED00008115
LZP    __
$a Pubmed-20120817/10/04

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...