Most cited article - PubMed ID 10905342
Two new families of tandem repeats isolated from genus Vicia using genomic self-priming PCR
Satellite DNA, a class of repetitive sequences forming long arrays of tandemly repeated units, represents substantial portions of many plant genomes yet remains poorly characterized due to various methodological obstacles. Here we show that the genome of the field bean (Vicia faba, 2n = 12), a long-established model for cytogenetic studies in plants, contains a diverse set of satellite repeats, most of which remained concealed until their present investigation. Using next-generation sequencing combined with novel bioinformatics tools, we reconstructed consensus sequences of 23 novel satellite repeats representing 0.008-2.700% of the genome and mapped their distribution on chromosomes. We found that in addition to typical satellites with monomers hundreds of nucleotides long, V. faba contains a large number of satellite repeats with unusually long monomers (687-2033 bp), which are predominantly localized in pericentromeric regions. Using chromatin immunoprecipitation with CenH3 antibody, we revealed an extraordinary diversity of centromeric satellites, consisting of seven repeats with chromosome-specific distribution. We also found that in spite of their different nucleotide sequences, all centromeric repeats are replicated during mid-S phase, while most other satellites are replicated in the first part of late S phase, followed by a single family of FokI repeats representing the latest replicating chromatin.
- MeSH
- Molecular Sequence Annotation MeSH
- Centromere metabolism MeSH
- Chromatin Immunoprecipitation MeSH
- DNA, Plant genetics metabolism MeSH
- Genome, Plant genetics MeSH
- Chromosome Mapping methods MeSH
- Evolution, Molecular MeSH
- DNA Replication Timing genetics MeSH
- DNA, Satellite genetics MeSH
- Sequence Analysis, DNA MeSH
- Vicia faba genetics metabolism MeSH
- Computational Biology MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
- MeSH
- DNA, Plant genetics MeSH
- Genome, Plant * MeSH
- Pisum sativum genetics MeSH
- In Situ Hybridization, Fluorescence MeSH
- Consensus Sequence MeSH
- Zea mays genetics MeSH
- Magnoliopsida genetics MeSH
- Chromosome Mapping methods MeSH
- Metaphase MeSH
- Computer Graphics MeSH
- Cyperaceae genetics MeSH
- DNA, Satellite classification genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Cluster Analysis MeSH
- Software * MeSH
- Vicia faba genetics MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
- MeSH
- Genome Size * MeSH
- Fabaceae classification genetics MeSH
- Phylogeny MeSH
- Genetic Variation * MeSH
- Genome, Plant * MeSH
- Genomics * methods MeSH
- Terminal Repeat Sequences MeSH
- Evolution, Molecular MeSH
- Repetitive Sequences, Nucleic Acid * MeSH
- Reproducibility of Results MeSH
- Sequence Analysis, DNA MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
BACKGROUND AND AIMS: Chromosomal evolution, including numerical and structural changes, is a major force in plant diversification and speciation. This study addresses genomic changes associated with the extensive chromosomal variation of the Mediterranean Prospero autumnale complex (Hyacinthaceae), which includes four diploid cytotypes each with a unique combination of chromosome number (x = 5, 6, 7), rDNA loci and genome size. METHODS: A new satellite repeat PaB6 has previously been identified, and monomers were reconstructed from next-generation sequencing (NGS) data of P. autumnale cytotype B(6)B(6) (2n = 12). Monomers of all other Prospero cytotypes and species were sequenced to check for lineage-specific mutations. Copy number, restriction patterns and methylation levels of PaB6 were analysed using Southern blotting. PaB6 was localized on chromosomes using fluorescence in situ hybridization (FISH). KEY RESULTS: The monomer of PaB6 is 249 bp long, contains several intact and truncated vertebrate-type telomeric repeats and is highly methylated. PaB6 is exceptional because of its high copy number and unprecedented variation among diploid cytotypes, ranging from 10(4) to 10(6) copies per 1C. PaB6 is always located in pericentromeric regions of several to all chromosomes. Additionally, two lineages of cytotype B(7)B(7) (x = 7), possessing either a single or duplicated 5S rDNA locus, differ in PaB6 copy number; the ancestral condition of a single locus is associated with higher PaB6 copy numbers. CONCLUSIONS: Although present in all Prospero species, PaB6 has undergone differential amplification only in chromosomally variable P. autumnale, particularly in cytotypes B(6)B(6) and B(5)B(5). These arose via independent chromosomal fusions from x = 7 to x = 6 and 5, respectively, accompanied by genome size increases. The copy numbers of satellite DNA PaB6 are among the highest in angiosperms, and changes of PaB6 are exceptionally dynamic in this group of closely related cytotypes of a single species. The evolution of the PaB6 copy numbers is discussed, and it is suggested that PaB6 represents a recent and highly dynamic system originating from a small pool of ancestral repeats.
- Keywords
- Hyacinthaceae, PaB6, Prospero autumnale, chromosomal evolution, copy number, differential amplification, fluorescence in situ hybridization (FISH), genome size, next-generation sequencing, pericentric satellite DNA,
- MeSH
- Chromosomes, Plant genetics MeSH
- Diploidy MeSH
- DNA, Plant genetics MeSH
- Phylogeny MeSH
- Genome, Plant MeSH
- Liliaceae genetics MeSH
- Models, Genetic MeSH
- Evolution, Molecular MeSH
- Molecular Sequence Data MeSH
- Polymerase Chain Reaction * MeSH
- Repetitive Sequences, Nucleic Acid genetics MeSH
- DNA, Satellite genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Telomere metabolism MeSH
- DNA Copy Number Variations MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
Satellite DNA sequences consist of tandemly arranged repetitive units up to thousands nucleotides long in head-to-tail orientation. The evolutionary processes by which satellites arise and evolve include unequal crossing over, gene conversion, transposition and extra chromosomal circular DNA formation. Large blocks of satellite DNA are often observed in heterochromatic regions of chromosomes and are a typical component of centromeric and telomeric regions. Satellite-rich loci may show specific banding patterns and facilitate chromosome identification and analysis of structural chromosome changes. Unlike many other genomes, nuclear genomes of banana (Musa spp.) are poor in satellite DNA and the information on this class of DNA remains limited. The banana cultivars are seed sterile clones originating mostly from natural intra-specific crosses within M. acuminata (A genome) and inter-specific crosses between M. acuminata and M. balbisiana (B genome). Previous studies revealed the closely related nature of the A and B genomes, including similarities in repetitive DNA. In this study we focused on two main banana DNA satellites, which were previously identified in silico. Their genomic organization and molecular diversity was analyzed in a set of nineteen Musa accessions, including representatives of A, B and S (M. schizocarpa) genomes and their inter-specific hybrids. The two DNA satellites showed a high level of sequence conservation within, and a high homology between Musa species. FISH with probes for the satellite DNA sequences, rRNA genes and a single-copy BAC clone 2G17 resulted in characteristic chromosome banding patterns in M. acuminata and M. balbisiana which may aid in determining genomic constitution in interspecific hybrids. In addition to improving the knowledge on Musa satellite DNA, our study increases the number of cytogenetic markers and the number of individual chromosomes, which can be identified in Musa.
- MeSH
- Musa genetics MeSH
- Chromosomes, Plant MeSH
- Diploidy MeSH
- Phylogeny MeSH
- Genetic Variation MeSH
- Genome, Plant * MeSH
- Chromosome Mapping MeSH
- Molecular Sequence Data MeSH
- Genes, Plant MeSH
- DNA, Satellite * MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Satellite * MeSH
We carried out a global survey of all major types of transposable elements in Silene latifolia, a model species with sex chromosomes that are in the early stages of their evolution. A shotgun genomic library was screened with genomic DNA to isolate and characterize the most abundant elements. We found that the most common types of elements were the subtelomeric tandem repeat X-43.1 and Gypsy retrotransposons, followed by Copia retrotransposons and LINE non-LTR elements. SINE elements and DNA transposons were less abundant. We also amplified transposable elements with degenerate primers and used them to screen the library. The localization of elements by FISH revealed that most of the Copia elements were accumulated on the Y chromosome. Surprisingly, one type of Gypsy element, which was similar to Ogre elements known from legumes, was almost absent on the Y chromosome but otherwise uniformly distributed on all chromosomes. Other types of elements were ubiquitous on all chromosomes. Moreover, we isolated and characterized two new tandem repeats. One of them, STAR-C, was localized at the centromeres of all chromosomes except the Y chromosome, where it was present on the p-arm. Its variant, STAR-Y, carrying a small deletion, was specifically localized on the q-arm of the Y chromosome. The second tandem repeat, TR1, co-localized with the 45S rDNA cluster in the subtelomeres of five pairs of autosomes. FISH analysis of other Silene species revealed that some elements (e.g., Ogre-like elements) are confined to the section Elisanthe while others (e.g. Copia or Athila-like elements) are present also in more distant species. Similarly, the centromeric satellite STAR-C was conserved in the genus Silene whereas the subtelomeric satellite X-43.1 was specific for Elisanthe section. Altogether, our data provide an overview of the repetitive sequences in Silene latifolia and revealed that genomic distribution and evolutionary dynamics differ among various repetitive elements. The unique pattern of repeat distribution is found on the Y chromosome, where some elements are accumulated while other elements are conspicuously absent, which probably reflects different forces shaping the Y chromosome.
- MeSH
- Chromosomes, Plant genetics MeSH
- DNA, Plant genetics MeSH
- Species Specificity MeSH
- In Situ Hybridization, Fluorescence MeSH
- Repetitive Sequences, Nucleic Acid genetics MeSH
- Silene classification genetics MeSH
- Tandem Repeat Sequences genetics MeSH
- DNA Transposable Elements genetics MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA Transposable Elements MeSH
BACKGROUND: Satellite repeats represent one of the most dynamic components of higher plant genomes, undergoing rapid evolutionary changes of their nucleotide sequences and abundance in a genome. However, the exact molecular mechanisms driving these changes and their eventual regulation are mostly unknown. It has been proposed that amplification and homogenization of satellite DNA could be facilitated by extrachromosomal circular DNA (eccDNA) molecules originated by recombination-based excision from satellite repeat arrays. While the models including eccDNA are attractive for their potential to explain rapid turnover of satellite DNA, the existence of satellite repeat-derived eccDNA has not yet been systematically studied in a wider range of plant genomes. RESULTS: We performed a survey of eccDNA corresponding to nine different families and three subfamilies of satellite repeats in ten species from various genera of higher plants (Arabidopsis, Oryza, Pisum, Secale, Triticum and Vicia). The repeats selected for this study differed in their monomer length, abundance, and chromosomal localization in individual species. Using two-dimensional agarose gel electrophoresis followed by Southern blotting, eccDNA molecules corresponding to all examined satellites were detected. EccDNA occurred in the form of nicked circles ranging from hundreds to over eight thousand nucleotides in size. Within this range the circular molecules occurred preferentially in discrete size intervals corresponding to multiples of monomer or higher-order repeat lengths. CONCLUSION: This work demonstrated that satellite repeat-derived eccDNA is common in plant genomes and thus it can be seriously considered as a potential intermediate in processes driving satellite repeat evolution. The observed size distribution of circular molecules suggests that they are most likely generated by molecular mechanisms based on homologous recombination requiring long stretches of sequence similarity.
- MeSH
- Electrophoresis, Gel, Two-Dimensional MeSH
- DNA, Plant genetics MeSH
- Genetic Markers MeSH
- Genome, Plant MeSH
- Cloning, Molecular MeSH
- DNA, Circular genetics MeSH
- Molecular Sequence Data MeSH
- Plants genetics MeSH
- DNA, Satellite genetics MeSH
- Base Sequence MeSH
- Sequence Alignment MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- Genetic Markers MeSH
- DNA, Circular MeSH
- DNA, Satellite MeSH
BACKGROUND: Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). RESULTS: Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. CONCLUSION: We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35-48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining.
- MeSH
- Chromosomes, Plant MeSH
- DNA, Plant * classification MeSH
- Genome, Plant * MeSH
- Gene Dosage MeSH
- Glycine max genetics MeSH
- Pisum sativum genetics MeSH
- In Situ Hybridization, Fluorescence MeSH
- Contig Mapping MeSH
- Medicago truncatula genetics MeSH
- Metaphase MeSH
- Repetitive Sequences, Nucleic Acid * MeSH
- Retroelements genetics MeSH
- Sequence Analysis, DNA MeSH
- Sequence Homology, Nucleic Acid MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Comparative Study MeSH
- Names of Substances
- DNA, Plant * MeSH
- Retroelements MeSH
Satellite sequences of the VicTR-B family are specific for the genus Vicia (Leguminosae), but their abundance varies among the species, being the highest in Vicia sativa and Vicia grandiflora. In this study, we have sequenced multiple randomly cloned VicTR-B fragments from these two species and analyzed their sequence variability, periodicity, and chromosomal localization. We have found that V. sativa VicTR-B sequences are homogeneous with respect to their nucleotide sequences and periodicity (monomers of 38 bp), whereas V. grandiflora repeats are considerably more variable, occurring in at least four distinct sequence subfamilies. Although the periodicity of 38 bp was conserved in most of the V. grandiflora sequences, one of the subfamilies was composed of higher-order repeats of 186 bp, which originated from a pentamer of the basic repeated unit. Individual VicTR-B subfamilies were preferentially located in either intercalary or subtelomeric regions of chromosomes. Interestingly, two V. grandiflora subfamilies with the highest similarity to V. sativa VicTR-B sequences were located in intercalary heterochromatic bands, showing similar chromosomal distribution as the majority of VicTR-B repeats in V. sativa. The other two V. grandiflora subfamilies showing a considerable divergence from V. sativa sequences were found to be accumulated at subtelomeric regions of V. grandiflora chromosomes.
- MeSH
- Chromosomes, Plant chemistry MeSH
- DNA, Plant analysis MeSH
- Genetic Variation MeSH
- In Situ Hybridization, Fluorescence MeSH
- Conserved Sequence * MeSH
- Chromosome Mapping * MeSH
- Molecular Sequence Data MeSH
- DNA, Satellite analysis MeSH
- Base Sequence MeSH
- Vicia genetics MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
Amplification and eventual elimination of dispersed repeats, especially those of the retroelement origin, account for most of the profound size variability observed among plant genomes. In most higher plants investigated so far, differential accumulation of various families of elements contributes to these differences. Here we report the identification of giant Ty3/gypsy-like retrotransposons from the legume plant Vicia pannonica, which alone make up approximately 38% of the genome of this species. These retrotransposons have structural features of the Ogre elements previously identified in the genomes of pea and Medicago. These features include extreme size (25 kb), the presence of an extra ORF upstream of the gag-pol region, and a putative intron dividing the prot and rt coding sequences. The Ogre elements are evenly dispersed on V. pannonica chromosomes except for terminal regions containing satellite repeats, their individual copies show extraordinary sequence similarity, and at least part of them are transcriptionally active, which suggests their recent amplification. Similar elements were also detected in several other Vicia species but in most cases in significantly lower numbers. However, there was no obvious correlation of the abundance of Ogre sequences with the genome size of these species.
- MeSH
- Gene Amplification MeSH
- DNA, Plant genetics MeSH
- Species Specificity MeSH
- Fabaceae genetics MeSH
- Genome, Plant * MeSH
- Gene Dosage MeSH
- In Situ Hybridization, Fluorescence MeSH
- Introns MeSH
- Conserved Sequence MeSH
- Molecular Sequence Data MeSH
- Open Reading Frames MeSH
- Retroelements genetics MeSH
- Plant Proteins genetics MeSH
- Base Sequence MeSH
- Vicia genetics MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Comparative Study MeSH
- Names of Substances
- DNA, Plant MeSH
- Retroelements MeSH
- Plant Proteins MeSH