Most cited article - PubMed ID 16788823
Sequence homogenization and chromosomal localization of VicTR-B satellites differ between closely related Vicia species
Amplification of monomer sequences into long contiguous arrays is the main feature distinguishing satellite DNA from other tandem repeats, yet it is also the main obstacle in its investigation because these arrays are in principle difficult to assemble. Here we explore an alternative, assembly-free approach that utilizes ultra-long Oxford Nanopore reads to infer the length distribution of satellite repeat arrays, their association with other repeats and the prevailing sequence periodicities. Using the satellite DNA-rich legume plant Lathyrus sativus as a model, we demonstrated this approach by analyzing 11 major satellite repeats using a set of nanopore reads ranging from 30 to over 200 kb in length and representing 0.73× genome coverage. We found surprising differences between the analyzed repeats because only two of them were predominantly organized in long arrays typical for satellite DNA. The remaining nine satellites were found to be derived from short tandem arrays located within LTR-retrotransposons that occasionally expanded in length. While the corresponding LTR-retrotransposons were dispersed across the genome, this array expansion occurred mainly in the primary constrictions of the L. sativus chromosomes, which suggests that these genome regions are favourable for satellite DNA accumulation.
- Keywords
- Lathyrus sativus, centromeres, fluorescence in situ hybridization (FISH), heterochromatin, long-range organization, nanopore sequencing, satellite DNA, sequence evolution, technical advance,
- MeSH
- Centromere MeSH
- Chromosomes, Plant MeSH
- DNA, Plant genetics MeSH
- Gene Frequency * MeSH
- Genome, Plant MeSH
- Heterochromatin MeSH
- Lathyrus genetics MeSH
- Evolution, Molecular MeSH
- Nanopores * MeSH
- Retroelements * MeSH
- DNA, Satellite * MeSH
- Tandem Repeat Sequences * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- Heterochromatin MeSH
- Retroelements * MeSH
- DNA, Satellite * MeSH
Satellite DNA (satDNA) is the most variable fraction of the eukaryotic genome. Related species share a common ancestral satDNA library and changing of any library component in a particular lineage results in interspecific differences. Although the general developmental trend is clear, our knowledge of the origin and dynamics of satDNAs is still fragmentary. Here, we explore whole genome shotgun Illumina reads using the RepeatExplorer (RE) pipeline to infer satDNA family life stories in the genomes of Chenopodium species. The seven diploids studied represent separate lineages and provide an example of a species complex typical for angiosperms. Application of the RE pipeline allowed by similarity searches a determination of the satDNA family with a basic monomer of ~40 bp and to trace its transformation from the reconstructed ancestral to the species-specific sequences. As a result, three types of satDNA family evolutionary development were distinguished: (i) concerted evolution with mutation and recombination events; (ii) concerted evolution with a trend toward increased complexity and length of the satellite monomer; and (iii) non-concerted evolution, with low levels of homogenization and multidirectional trends. The third type is an example of entire repeatome transformation, thus producing a novel set of satDNA families, and genomes showing non-concerted evolution are proposed as a significant source for genomic diversity.
- Keywords
- genome evolution, high order repeats, next-generation sequencing, plants, satellite DNA,
- MeSH
- Chenopodium genetics MeSH
- Diploidy MeSH
- DNA, Plant genetics MeSH
- Species Specificity MeSH
- Phylogeny MeSH
- Genome, Plant MeSH
- Genome Components MeSH
- Evolution, Molecular MeSH
- DNA, Satellite genetics MeSH
- Sequence Analysis, DNA MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
- MeSH
- DNA, Plant genetics MeSH
- Genome, Plant * MeSH
- Pisum sativum genetics MeSH
- In Situ Hybridization, Fluorescence MeSH
- Consensus Sequence MeSH
- Zea mays genetics MeSH
- Magnoliopsida genetics MeSH
- Chromosome Mapping methods MeSH
- Metaphase MeSH
- Computer Graphics MeSH
- Cyperaceae genetics MeSH
- DNA, Satellite classification genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Cluster Analysis MeSH
- Software * MeSH
- Vicia faba genetics MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
Satellite DNA sequences consist of tandemly arranged repetitive units up to thousands nucleotides long in head-to-tail orientation. The evolutionary processes by which satellites arise and evolve include unequal crossing over, gene conversion, transposition and extra chromosomal circular DNA formation. Large blocks of satellite DNA are often observed in heterochromatic regions of chromosomes and are a typical component of centromeric and telomeric regions. Satellite-rich loci may show specific banding patterns and facilitate chromosome identification and analysis of structural chromosome changes. Unlike many other genomes, nuclear genomes of banana (Musa spp.) are poor in satellite DNA and the information on this class of DNA remains limited. The banana cultivars are seed sterile clones originating mostly from natural intra-specific crosses within M. acuminata (A genome) and inter-specific crosses between M. acuminata and M. balbisiana (B genome). Previous studies revealed the closely related nature of the A and B genomes, including similarities in repetitive DNA. In this study we focused on two main banana DNA satellites, which were previously identified in silico. Their genomic organization and molecular diversity was analyzed in a set of nineteen Musa accessions, including representatives of A, B and S (M. schizocarpa) genomes and their inter-specific hybrids. The two DNA satellites showed a high level of sequence conservation within, and a high homology between Musa species. FISH with probes for the satellite DNA sequences, rRNA genes and a single-copy BAC clone 2G17 resulted in characteristic chromosome banding patterns in M. acuminata and M. balbisiana which may aid in determining genomic constitution in interspecific hybrids. In addition to improving the knowledge on Musa satellite DNA, our study increases the number of cytogenetic markers and the number of individual chromosomes, which can be identified in Musa.
- MeSH
- Musa genetics MeSH
- Chromosomes, Plant MeSH
- Diploidy MeSH
- Phylogeny MeSH
- Genetic Variation MeSH
- Genome, Plant * MeSH
- Chromosome Mapping MeSH
- Molecular Sequence Data MeSH
- Genes, Plant MeSH
- DNA, Satellite * MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Satellite * MeSH
BACKGROUND: Satellite repeats represent one of the most dynamic components of higher plant genomes, undergoing rapid evolutionary changes of their nucleotide sequences and abundance in a genome. However, the exact molecular mechanisms driving these changes and their eventual regulation are mostly unknown. It has been proposed that amplification and homogenization of satellite DNA could be facilitated by extrachromosomal circular DNA (eccDNA) molecules originated by recombination-based excision from satellite repeat arrays. While the models including eccDNA are attractive for their potential to explain rapid turnover of satellite DNA, the existence of satellite repeat-derived eccDNA has not yet been systematically studied in a wider range of plant genomes. RESULTS: We performed a survey of eccDNA corresponding to nine different families and three subfamilies of satellite repeats in ten species from various genera of higher plants (Arabidopsis, Oryza, Pisum, Secale, Triticum and Vicia). The repeats selected for this study differed in their monomer length, abundance, and chromosomal localization in individual species. Using two-dimensional agarose gel electrophoresis followed by Southern blotting, eccDNA molecules corresponding to all examined satellites were detected. EccDNA occurred in the form of nicked circles ranging from hundreds to over eight thousand nucleotides in size. Within this range the circular molecules occurred preferentially in discrete size intervals corresponding to multiples of monomer or higher-order repeat lengths. CONCLUSION: This work demonstrated that satellite repeat-derived eccDNA is common in plant genomes and thus it can be seriously considered as a potential intermediate in processes driving satellite repeat evolution. The observed size distribution of circular molecules suggests that they are most likely generated by molecular mechanisms based on homologous recombination requiring long stretches of sequence similarity.
- MeSH
- Electrophoresis, Gel, Two-Dimensional MeSH
- DNA, Plant genetics MeSH
- Genetic Markers MeSH
- Genome, Plant MeSH
- Cloning, Molecular MeSH
- DNA, Circular genetics MeSH
- Molecular Sequence Data MeSH
- Plants genetics MeSH
- DNA, Satellite genetics MeSH
- Base Sequence MeSH
- Sequence Alignment MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- Genetic Markers MeSH
- DNA, Circular MeSH
- DNA, Satellite MeSH
BACKGROUND: Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). RESULTS: Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. CONCLUSION: We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35-48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining.
- MeSH
- Chromosomes, Plant MeSH
- DNA, Plant * classification MeSH
- Genome, Plant * MeSH
- Gene Dosage MeSH
- Glycine max genetics MeSH
- Pisum sativum genetics MeSH
- In Situ Hybridization, Fluorescence MeSH
- Contig Mapping MeSH
- Medicago truncatula genetics MeSH
- Metaphase MeSH
- Repetitive Sequences, Nucleic Acid * MeSH
- Retroelements genetics MeSH
- Sequence Analysis, DNA MeSH
- Sequence Homology, Nucleic Acid MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Comparative Study MeSH
- Names of Substances
- DNA, Plant * MeSH
- Retroelements MeSH