-
Je něco špatně v tomto záznamu ?
Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data
J. Macas, P. Neumann, P. Novák, J. Jiang,
Jazyk angličtina Země Anglie, Velká Británie
Typ dokumentu časopisecké články, práce podpořená grantem, Research Support, U.S. Gov't, Non-P.H.S.
NLK
Free Medical Journals
od 1996 do Před 1 rokem
PubMed Central
od 2007
Open Access Digital Library
od 1996-01-01
Medline Complete (EBSCOhost)
od 1998-01-01
Oxford Journals Open Access Collection
od 1985-01-01 do 2022-09-30
Oxford Journals Open Access Collection
od 1985-01-01
ROAD: Directory of Open Access Scholarly Resources
od 1998
- MeSH
- centromera genetika MeSH
- chromozomy rostlin genetika MeSH
- DNA rostlinná genetika MeSH
- konzervovaná sekvence genetika MeSH
- molekulární sekvence - údaje MeSH
- rýže (rod) genetika MeSH
- satelitní DNA genetika MeSH
- sekvence nukleotidů MeSH
- sekvenční analýza DNA metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc12026265
- 003
- CZ-PrNML
- 005
- 20121206114803.0
- 007
- ta
- 008
- 120817e20100708enk f 000 0#eng||
- 009
- AR
- 024 7_
- $a 10.1093/bioinformatics/btq343 $2 doi
- 035 __
- $a (PubMed)20616383
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a enk
- 100 1_
- $a Macas, Jirí $u Institute of Plant Molecular Biology, Biology Centre ASCR, Branisovska 31, CZ-37005, Ceske Budejovice, Czech Republic. macas@umbr.cas.cz
- 245 10
- $a Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data / $c J. Macas, P. Neumann, P. Novák, J. Jiang,
- 520 9_
- $a MOTIVATION: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. RESULTS: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats.
- 650 _2
- $a sekvence nukleotidů $7 D001483
- 650 _2
- $a centromera $x genetika $7 D002503
- 650 _2
- $a chromozomy rostlin $x genetika $7 D032461
- 650 _2
- $a konzervovaná sekvence $x genetika $7 D017124
- 650 _2
- $a DNA rostlinná $x genetika $7 D018744
- 650 _2
- $a satelitní DNA $x genetika $7 D004276
- 650 _2
- $a molekulární sekvence - údaje $7 D008969
- 650 _2
- $a rýže (rod) $x genetika $7 D012275
- 650 _2
- $a sekvenční analýza DNA $x metody $7 D017422
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 655 _2
- $a Research Support, U.S. Gov't, Non-P.H.S. $7 D013486
- 700 1_
- $a Neumann, Pavel
- 700 1_
- $a Novák, Petr
- 700 1_
- $a Jiang, Jiming
- 773 0_
- $w MED00008115 $t Bioinformatics (Oxford, England) $x 1367-4811 $g Roč. 26, č. 17 (20100708), s. 2101-8
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/20616383 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y m
- 990 __
- $a 20120817 $b ABA008
- 991 __
- $a 20121206114836 $b ABA008
- 999 __
- $a ok $b bmc $g 948307 $s 783611
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2010 $b 26 $c 17 $d 2101-8 $e 20100708 $i 1367-4811 $m Bioinformatics $n Bioinformatics $x MED00008115
- LZP __
- $a Pubmed-20120817/10/04