Most cited article - PubMed ID 19563868
Hypervariable 3' UTR region of plant LTR-retrotransposons as a source of novel satellite repeats
Transposable elements (TEs) constitute a significant part of plant genomes and shape their genomic landscape. While some TEs are ubiquitously dispersed, other elements specifically occupy discrete genomic loci. The evolutionary forces behind the chromosomal localization of TEs are poorly understood. Therefore, we first review specific chromosomal niches where TEs are often localized including (i) centromeres, (ii) (sub)telomeres, (iii) genes, and (iv) sex chromosomes. In the second part of this review, we focus on the processes standing behind non-equal distribution of various TEs in genomes including (i) purifying selection, (ii) insertion site preference or targeting of TEs, (iii) post-insertion ectopic recombination between TEs, and (iv) spatiotemporal regulation of TE jumping. Using the combination of the above processes, we explain the distribution of TEs on sex chromosomes. We also describe the phenomena of mutual nesting of TEs, epigenetic mark silencing in TEs, and TE interactions in the 3D interphase nucleus concerning TE localization. We summarize the functional consequences of TE distribution and relate them to cell functioning and genome evolution.
- Keywords
- Centromere, chromosomes, plant genome, recombination, transcription factor, transposable elements,
- MeSH
- Centromere genetics MeSH
- Chromosomes, Plant * genetics MeSH
- Genome, Plant genetics MeSH
- Evolution, Molecular MeSH
- Plants * genetics MeSH
- DNA Transposable Elements * genetics MeSH
- Publication type
- Journal Article MeSH
- Review MeSH
- Names of Substances
- DNA Transposable Elements * MeSH
Satellite repeats are major sequence constituents of centromeres in many plant and animal species. Within a species, a single family of satellite sequences typically occupies centromeres of all chromosomes and is absent from other parts of the genome. Due to their common origin, sequence similarities exist among the centromere-specific satellites in related species. Here, we report a remarkably different pattern of centromere evolution in the plant tribe Fabeae, which includes genera Pisum, Lathyrus, Vicia, and Lens. By immunoprecipitation of centromeric chromatin with CENH3 antibodies, we identified and characterized a large and diverse set of 64 families of centromeric satellites in 14 species. These families differed in their nucleotide sequence, monomer length (33-2,979 bp), and abundance in individual species. Most families were species-specific, and most species possessed multiple (2-12) satellites in their centromeres. Some of the repeats that were shared by several species exhibited promiscuous patterns of centromere association, being located within CENH3 chromatin in some species, but apart from the centromeres in others. Moreover, FISH experiments revealed that the same family could assume centromeric and noncentromeric positions even within a single species. Taken together, these findings suggest that Fabeae centromeres are not shaped by the coevolution of a single centromeric satellite with its interacting CENH3 proteins, as proposed by the centromere drive model. This conclusion is also supported by the absence of pervasive adaptive evolution of CENH3 sequences retrieved from Fabeae species.
- Keywords
- CENH3, ChIP-seq, centromere evolution, plant chromosomes, satellite DNA,
- MeSH
- Centromere chemistry MeSH
- Species Specificity MeSH
- Fabaceae genetics MeSH
- Genetic Variation * MeSH
- DNA, Satellite chemistry MeSH
- Selection, Genetic MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Comparative Study MeSH
- Names of Substances
- DNA, Satellite MeSH
Amplification of monomer sequences into long contiguous arrays is the main feature distinguishing satellite DNA from other tandem repeats, yet it is also the main obstacle in its investigation because these arrays are in principle difficult to assemble. Here we explore an alternative, assembly-free approach that utilizes ultra-long Oxford Nanopore reads to infer the length distribution of satellite repeat arrays, their association with other repeats and the prevailing sequence periodicities. Using the satellite DNA-rich legume plant Lathyrus sativus as a model, we demonstrated this approach by analyzing 11 major satellite repeats using a set of nanopore reads ranging from 30 to over 200 kb in length and representing 0.73× genome coverage. We found surprising differences between the analyzed repeats because only two of them were predominantly organized in long arrays typical for satellite DNA. The remaining nine satellites were found to be derived from short tandem arrays located within LTR-retrotransposons that occasionally expanded in length. While the corresponding LTR-retrotransposons were dispersed across the genome, this array expansion occurred mainly in the primary constrictions of the L. sativus chromosomes, which suggests that these genome regions are favourable for satellite DNA accumulation.
- Keywords
- Lathyrus sativus, centromeres, fluorescence in situ hybridization (FISH), heterochromatin, long-range organization, nanopore sequencing, satellite DNA, sequence evolution, technical advance,
- MeSH
- Centromere MeSH
- Chromosomes, Plant MeSH
- DNA, Plant genetics MeSH
- Gene Frequency * MeSH
- Genome, Plant MeSH
- Heterochromatin MeSH
- Lathyrus genetics MeSH
- Evolution, Molecular MeSH
- Nanopores * MeSH
- Retroelements * MeSH
- DNA, Satellite * MeSH
- Tandem Repeat Sequences * MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- Heterochromatin MeSH
- Retroelements * MeSH
- DNA, Satellite * MeSH
BACKGROUND: Plant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy. They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Previous attempts to group these families into broader categories reflecting their phylogenetic relationships were limited either to analyzing a narrow range of plant species or to analyzing a small numbers of elements. Furthermore, there is no reference database that allows for similarity based classification of LTR-retrotransposons. RESULTS: We have assembled a database of retrotransposon encoded polyprotein domains sequences extracted from 5410 Ty1/copia elements and 8453 Ty3/gypsy elements sampled from 80 species representing major groups of green plants (Viridiplantae). Phylogenetic analysis of the three most conserved polyprotein domains (RT, RH and INT) led to dividing Ty1/copia and Ty3/gypsy retrotransposons into 16 and 14 lineages respectively. We also characterized various features of LTR-retrotransposon sequences including additional polyprotein domains, extra open reading frames and primer binding sites, and found that the occurrence and/or type of these features correlates with phylogenies inferred from the three protein domains. CONCLUSIONS: We have established an improved classification system applicable to LTR-retrotransposons from a wide range of plant species. This system reflects phylogenetic relationships as well as distinct sequence and structural features of the elements. A comprehensive database of retrotransposon protein domains (REXdb) that reflects this classification provides a reference for efficient and unified annotation of LTR-retrotransposons in plant genomes. Access to REXdb related tools is implemented in the RepeatExplorer web server (https://repeatexplorer-elixir.cerit-sc.cz/) or using a standalone version of REXdb that can be downloaded seaparately from RepeatExplorer web page (http://repeatexplorer.org/).
- Keywords
- LTR-retrotransposons, Polyprotein domains, Primer binding site, RepeatExplorer, Transposable elements,
- Publication type
- Journal Article MeSH
Satellite DNA, a class of repetitive sequences forming long arrays of tandemly repeated units, represents substantial portions of many plant genomes yet remains poorly characterized due to various methodological obstacles. Here we show that the genome of the field bean (Vicia faba, 2n = 12), a long-established model for cytogenetic studies in plants, contains a diverse set of satellite repeats, most of which remained concealed until their present investigation. Using next-generation sequencing combined with novel bioinformatics tools, we reconstructed consensus sequences of 23 novel satellite repeats representing 0.008-2.700% of the genome and mapped their distribution on chromosomes. We found that in addition to typical satellites with monomers hundreds of nucleotides long, V. faba contains a large number of satellite repeats with unusually long monomers (687-2033 bp), which are predominantly localized in pericentromeric regions. Using chromatin immunoprecipitation with CenH3 antibody, we revealed an extraordinary diversity of centromeric satellites, consisting of seven repeats with chromosome-specific distribution. We also found that in spite of their different nucleotide sequences, all centromeric repeats are replicated during mid-S phase, while most other satellites are replicated in the first part of late S phase, followed by a single family of FokI repeats representing the latest replicating chromatin.
- MeSH
- Molecular Sequence Annotation MeSH
- Centromere metabolism MeSH
- Chromatin Immunoprecipitation MeSH
- DNA, Plant genetics metabolism MeSH
- Genome, Plant genetics MeSH
- Chromosome Mapping methods MeSH
- Evolution, Molecular MeSH
- DNA Replication Timing genetics MeSH
- DNA, Satellite genetics MeSH
- Sequence Analysis, DNA MeSH
- Vicia faba genetics metabolism MeSH
- Computational Biology MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent k-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of Vicia faba and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence in situ hybridization.
- MeSH
- DNA, Plant genetics MeSH
- Genome, Plant * MeSH
- Pisum sativum genetics MeSH
- In Situ Hybridization, Fluorescence MeSH
- Consensus Sequence MeSH
- Zea mays genetics MeSH
- Magnoliopsida genetics MeSH
- Chromosome Mapping methods MeSH
- Metaphase MeSH
- Computer Graphics MeSH
- Cyperaceae genetics MeSH
- DNA, Satellite classification genetics MeSH
- Base Sequence MeSH
- Sequence Analysis, DNA MeSH
- Cluster Analysis MeSH
- Software * MeSH
- Vicia faba genetics MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
A significant part of eukaryotic genomes is formed by transposable elements (TEs) containing not only genes but also regulatory sequences. Some of the regulatory sequences located within TEs can form secondary structures like hairpins or three-stranded (triplex DNA) and four-stranded (quadruplex DNA) conformations. This review focuses on recent evidence showing that G-quadruplex-forming sequences in particular are often present in specific parts of TEs in plants and humans. We discuss the potential role of these structures in the TE life cycle as well as the impact of G-quadruplexes on replication, transcription, translation, chromatin status, and recombination. The aim of this review is to emphasize that TEs may serve as vehicles for the genomic spread of G-quadruplexes. These non-canonical DNA structures and their conformational switches may constitute another regulatory system that, together with small and long non-coding RNA molecules and proteins, contribute to the complex cellular network resulting in the large diversity of eukaryotes.
- Keywords
- DNA and RNA quadruplexes, G-quadruplexes, LTR retrotransposons, recombination, replication, transcription, transposable elements,
- MeSH
- DNA-Binding Proteins metabolism MeSH
- G-Quadruplexes * MeSH
- Genomics MeSH
- Humans MeSH
- Open Reading Frames MeSH
- Gene Expression Regulation MeSH
- Regulatory Sequences, Nucleic Acid MeSH
- Repetitive Sequences, Nucleic Acid MeSH
- DNA Replication MeSH
- Retroelements genetics MeSH
- RNA chemistry genetics MeSH
- Plants genetics MeSH
- DNA Transposable Elements genetics MeSH
- Protein Binding MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Review MeSH
- Names of Substances
- DNA-Binding Proteins MeSH
- Retroelements MeSH
- RNA MeSH
- DNA Transposable Elements MeSH
Long terminal repeat (LTR) retrotransposons make up substantial parts of most higher plant genomes where they accumulate due to their replicative mode of transposition. Although the transposition is facilitated by proteins encoded within the gag-pol region which is common to all autonomous elements, some LTR retrotransposons were found to potentially carry an additional protein coding capacity represented by extra open reading frames located upstream or downstream of gag-pol. In this study, we performed a comprehensive in silico survey and comparative analysis of these extra open reading frames (ORFs) in the group of Ty3/gypsy LTR retrotransposons as the first step towards our understanding of their origin and function. We found that extra ORFs occur in all three major lineages of plant Ty3/gypsy elements, being the most frequent in the Tat lineage where most (77 %) of identified elements contained extra ORFs. This lineage was also characterized by the highest diversity of extra ORF arrangement (position and orientation) within the elements. On the other hand, all of these ORFs could be classified into only two broad groups based on their mutual similarities or the presence of short conserved motifs in their inferred protein sequences. In the Athila lineage, the extra ORFs were confined to the element 3' regions but they displayed much higher sequence diversity compared to those found in Tat. In the lineage of Chromoviruses the extra ORFs were relatively rare, occurring only in 5' regions of a group of elements present in a single plant family (Poaceae). In all three lineages, most extra ORFs lacked sequence similarities to characterized gene sequences or functional protein domains, except for two Athila-like elements with similarities to LOGL4 gene and part of the Chromoviruses extra ORFs that displayed partial similarity to histone H3 gene. Thus, in these cases the extra ORFs most likely originated by transduction or recombination of cellular gene sequences. In addition, the protein domain which is otherwise associated with DNA transposons have been detected in part of the Tat-like extra ORFs, pointing to their origin from an insertion event of a mobile element.
- MeSH
- DNA, Plant * MeSH
- Phylogeny MeSH
- Genetic Linkage MeSH
- Ferns classification genetics MeSH
- Terminal Repeat Sequences * MeSH
- Molecular Sequence Data MeSH
- Open Reading Frames * MeSH
- Gene Order MeSH
- Retroelements * MeSH
- Plant Viruses genetics MeSH
- Amino Acid Sequence MeSH
- Sequence Alignment MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant * MeSH
- Retroelements * MeSH