Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data
Language English Country England, Great Britain Media print-electronic
Document type Journal Article, Research Support, Non-U.S. Gov't, Research Support, U.S. Gov't, Non-P.H.S.
PubMed
20616383
DOI
10.1093/bioinformatics/btq343
PII: btq343
Knihovny.cz E-resources
- MeSH
- Centromere genetics MeSH
- Chromosomes, Plant genetics MeSH
- DNA, Plant genetics MeSH
- Conserved Sequence genetics MeSH
- Molecular Sequence Data MeSH
- Oryza genetics MeSH
- DNA, Satellite genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.
References provided by Crossref.org
Telomere binding protein TRB1 is associated with promoters of translation machinery genes in vivo
Molecular analysis and genomic organization of major DNA satellites in banana (Musa spp.)