Middle-range clustering of nucleotides in genomes
Language English Country England, Great Britain Media print
Document type Journal Article, Research Support, Non-U.S. Gov't
- MeSH
- Algorithms * MeSH
- DNA MeSH
- Genome * MeSH
- Humans MeSH
- Molecular Sequence Data MeSH
- Nucleotides genetics MeSH
- Base Sequence MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA MeSH
- Nucleotides MeSH
We propose a novel, transparent and very simple algorithm to analyze middle-range correlations in genomic nucleotide sequences. Analysis by this algorithm of the EMBL Nucleotide Sequence Database demonstrates that all four nucleotides cluster in the genomic nucleotide sequences of eukaryotes on the scale of several hundred base pairs. In prokaryotes, the clustering is weak but still evident. The non-dominant three bases are deficient in the clusters, while A is the most deficient nucleotide in the clusters of C, and vice versa, and G is the most deficient nucleotide in the clusters of T, and vice versa. The algorithm also detects CG islands, extending over 1 kb, in vertebrate sequences. In plants, the CG islands are shown to be much smaller, if they exist at all. A clustering tendency is also exhibited by the TA doublet. Other doublets do not cluster. We observe no strong correlation between nucleotides separated in genomes by > 1 kb.
References provided by Crossref.org
Mosaic structure of the DNA molecules of the human chromosomes 21 and 22
Conformational properties of DNA strands containing guanine-adenine and thymine-adenine repeats