Most cited article - PubMed ID 26061193
Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size
Repetitive DNA contributes significantly to plant genome size, adaptation, and evolution. However, little is understood about the transcription of repeats. This is addressed here in the plant green foxtail millet (Setaria viridis). First, we used RepeatExplorer2 to calculate the genome proportion (GP) of all repeat types and compared the GP of long terminal repeat (LTR) retroelements against annotated complete and incomplete LTR retroelements (Ty1/copia and Ty3/gypsy) identified by DANTE in a whole genome assembly. We show that DANTE-identified LTR retroelements can comprise ∼0.75% of the inflorescence poly-A transcriptome and ∼0.24% of the stem ribo-depleted transcriptome. In the RNA libraries from inflorescence tissue, both LTR retroelements and DNA transposons identified by RepeatExplorer2 were highly abundant, where they may be taking advantage of the reduced epigenetic silencing in the germ line to amplify. Typically, there was a higher representation of DANTE-identified LTR retroelements in the transcriptome than RepeatExplorer2-identified LTR retroelements, potentially reflecting the transcription of elements that have insufficient genomic copy numbers to be detected by RepeatExplorer2. In contrast, for ribo-depleted libraries of stem tissues, the reverse was observed, with a higher transcriptome representation of RepeatExplorer2-identified LTR retroelements. For RepeatExplorer2-identified repeats, we show that the GP of most Ty1/copia and Ty3/gypsy families were positively correlated with their transcript proportion. In addition, guanine- and cytosine-rich repeats with high sequence similarity were also the most abundant in the transcriptome, and these likely represent young elements that are most capable of amplification due to their ability to evade epigenetic silencing.
- MeSH
- Transcription, Genetic MeSH
- Genome, Plant MeSH
- Terminal Repeat Sequences * MeSH
- Gene Expression Regulation, Plant MeSH
- Retroelements * MeSH
- Setaria Plant * genetics MeSH
- Transcriptome MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Retroelements * MeSH
Long terminal repeat (LTR) retrotransposons constitute a predominant class of repetitive DNA elements in most plant genomes. With the increasing number of sequenced plant genomes, there is an ongoing demand for computational tools facilitating efficient annotation and classification of LTR retrotransposons in plant genome assemblies. Herein, we introduce DANTE, a computational pipeline for Domain-based ANnotation of Transposable Elements, designed for sensitive detection of these elements via their conserved protein domain sequences. The identified protein domains are subsequently inputted into the DANTE_LTR pipeline to annotate complete element sequences by detecting their structural features, such as LTRs, in adjacent genomic regions. Leveraging domain sequences allows for precise classification of elements into phylogenetic lineages, offering a more granular annotation compared with coarser conventional superfamily-based classification methods. The efficiency and accuracy of this approach were evidenced via annotation of LTR retrotransposons in 93 plant genomes. Results were benchmarked against several established pipelines, showing that DANTE_LTR is capable of identifying significantly more intact LTR retrotransposons. DANTE and DANTE_LTR are provided as user-friendly Galaxy tools accessible via a public server (https://repeatexplorer-elixir.cerit-sc.cz), installable on local Galaxy instances from the Galaxy tool shed or executable from the command line.
- Publication type
- Journal Article MeSH
Genome size varies 2400-fold across plants, influencing their evolution through changes in cell size and cell division rates which impact plants' environmental stress tolerance. Repetitive element expansion explains much genome size diversity, and the processes structuring repeat 'communities' are analogous to those structuring ecological communities. However, which environmental stressors influence repeat community dynamics has not yet been examined from an ecological perspective. We measured genome size and leveraged climatic data for 91% of genera within the ecologically diverse palm family (Arecaceae). We then generated genomic repeat profiles for 141 palm species, and analysed repeats using phylogenetically informed linear models to explore relationships between repeat dynamics and environmental factors. We show that palm genome size and repeat 'community' composition are best explained by aridity. Specifically, Ty3-gypsy and TIR elements were more abundant in palm species from wetter environments, which generally had larger genomes, suggesting amplification. By contrast, Ty1-copia and LINE elements were more abundant in drier environments. Our results suggest that water stress inhibits repeat expansion through selection on upper genome size limits. However, elements that may associate with stress-response genes (e.g. Ty1-copia) have amplified in arid-adapted palm species. Overall, we provide novel evidence of climate influencing the assembly of repeat 'communities'.
- Keywords
- Arecaceae (palms), adaptation, ecology, genome size, phylogenetic regression, plant evolution, trait evolution, transposable elements,
- MeSH
- Arecaceae * genetics MeSH
- Genome Size MeSH
- Phylogeny MeSH
- Genome, Plant MeSH
- Evolution, Molecular MeSH
- Retroelements * MeSH
- Sequence Analysis, DNA MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- Retroelements * MeSH
To provide insights into the fate of transposable elements (TEs) across timescales in a post-polyploidization context, we comparatively investigate five sibling Dactylorhiza allotetraploids (Orchidaceae) formed independently and sequentially between 500 and 100K generations ago by unidirectional hybridization between diploids D. fuchsii and D. incarnata. Our results first reveal that the paternal D. incarnata genome shows a marked increased content of LTR retrotransposons compared to the maternal species, reflected in its larger genome size and consistent with a previously hypothesized bottleneck. With regard to the allopolyploids, in the youngest D. purpurella both genome size and TE composition appear to be largely additive with respect to parents, whereas for polyploids of intermediate ages we uncover rampant genome expansion on a magnitude of multiple entire genomes of some plants such as Arabidopsis. The oldest allopolyploids in the series are not larger than the intermediate ones. A putative tandem repeat, potentially derived from a non-autonomous miniature inverted-repeat TE (MITE) drives much of the genome dynamics in the allopolyploids. The highly dynamic MITE-like element is found in higher proportions in the maternal diploid, D. fuchsii, but is observed to increase in copy number in both subgenomes of the allopolyploids. Altogether, the fate of repeats appears strongly regulated and therefore predictable across multiple independent allopolyploidization events in this system. Apart from the MITE-like element, we consistently document a mild genomic shock following the allopolyploidizations investigated here, which may be linked to their relatively large genome sizes, possibly associated with strong selection against further genome expansions.
- Keywords
- allopolyploidy, genome size, genomic shock, marsh orchids, transposable elements,
- MeSH
- Diploidy MeSH
- Genome, Plant MeSH
- Humans MeSH
- Wetlands MeSH
- Orchidaceae * genetics MeSH
- Polyploidy MeSH
- Siblings * MeSH
- DNA Transposable Elements genetics MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA Transposable Elements MeSH
Plant genomes are highly diverse in size and repetitive DNA composition. In the absence of polyploidy, the dynamics of repetitive elements, which make up the bulk of the genome in many species, are the main drivers underpinning changes in genome size and the overall evolution of the genomic landscape. The advent of high-throughput sequencing technologies has enabled investigation of genome evolutionary dynamics beyond model plants to provide exciting new insights in species across the biodiversity of life. Here we analyze the evolution of repetitive DNA in two closely related species of Heloniopsis (Melanthiaceae), which despite having the same chromosome number differ nearly twofold in genome size [i.e., H. umbellata (1C = 4,680 Mb), and H. koreana (1C = 2,480 Mb)]. Low-coverage genome skimming and the RepeatExplorer2 pipeline were used to identify the main repeat families responsible for the significant differences in genome sizes. Patterns of repeat evolution were found to correlate with genome size with the main classes of transposable elements identified being twice as abundant in the larger genome of H. umbellata compared with H. koreana. In addition, among the satellite DNA families recovered, a single shared satellite (HeloSAT) was shown to have contributed significantly to the genome expansion of H. umbellata. Evolutionary changes in repetitive DNA composition and genome size indicate that the differences in genome size between these species have been underpinned by the activity of several distinct repeat lineages.
- Keywords
- C-value, DNA repeats, chromosome, satellite DNA, transposable elements,
- Publication type
- Journal Article MeSH
BACKGROUND: Cultivated grasses are an important source of food for domestic animals worldwide. Increased knowledge of their genomes can speed up the development of new cultivars with better quality and greater resistance to biotic and abiotic stresses. The most widely grown grasses are tetraploid ryegrass species (Lolium) and diploid and hexaploid fescue species (Festuca). In this work, we characterized repetitive DNA sequences and their contribution to genome size in five fescue and two ryegrass species as well as one fescue and two ryegrass cultivars. RESULTS: Partial genome sequences produced by Illumina sequencing technology were used for genome-wide comparative analyses with the RepeatExplorer pipeline. Retrotransposons were the most abundant repeat type in all seven grass species. The Athila element of the Ty3/gypsy family showed the most striking differences in copy number between fescues and ryegrasses. The sequence data enabled the assembly of the long terminal repeat (LTR) element Fesreba, which is highly enriched in centromeric and (peri)centromeric regions in all species. A combination of fluorescence in situ hybridization (FISH) with a probe specific to the Fesreba element and immunostaining with centromeric histone H3 (CENH3) antibody showed their co-localization and indicated a possible role of Fesreba in centromere function. CONCLUSIONS: Comparative repeatome analyses in a set of fescues and ryegrasses provided new insights into their genome organization and divergence, including the assembly of the LTR element Fesreba. A new LTR element Fesreba was identified and found in abundance in centromeric regions of the fescues and ryegrasses. It may play a role in the function of their centromeres.
Plant genomes vary greatly in composition and size mainly due to the diversity of repetitive DNAs and the inherent propensity for their amplification and removal from the host genome. Most studies addressing repeatome dynamics focus on model organisms, whereas few provide comprehensive investigations across the genomes of related taxa. Herein, we analyze the evolution of repeats of the 13 species in Melampodium sect. Melampodium, representing all but two of its diploid taxa, in a phylogenetic context. The investigated genomes range in size from 0.49 to 2.27 pg/1C (ca. 4.5-fold variation), despite having the same base chromosome number (x = 10) and very strong phylogenetic affinities. Phylogenetic analysis performed in BEAST and ancestral genome size reconstruction revealed mixed patterns of genome size increases and decreases across the group. High-throughput genome skimming and the RepeatExplorer pipeline were utilized to determine the repeat families responsible for the differences in observed genome sizes. Patterns of repeat evolution were found to be highly correlated with phylogenetic position, namely taxonomic series circumscription. Major differences found were in the abundances of the SIRE (Ty1-copia), Athila (Ty3-gypsy), and CACTA (DNA transposon) lineages. Additionally, several satellite DNA families were found to be highly group-specific, although their overall contribution to genome size variation was relatively small. Evolutionary changes in repetitive DNA composition and genome size were complex, with independent patterns of genome up- and downsizing throughout the evolution of the analyzed diploids. A model-based analysis of genome size and repetitive DNA composition revealed evidence for strong phylogenetic signal and differential evolutionary rates of major lineages of repeats in the diploid genomes.
- Keywords
- Bayesian analysis, Melampodium, ancestral state reconstruction, genome size, phylogenetics, repetitive DNA, tandem repeats, transposable elements,
- Publication type
- Journal Article MeSH
Knowledge of the fascinating world of DNA repeats is continuously being enriched by newly identified elements and their hypothetical or well-established biological relevance. Genomic approaches can be used for comparative studies of major repeats in any group of genomes, regardless of their size and complexity. Such studies are particularly fruitful in large genomes, and useful mainly in crop plants where they provide a rich source of molecular markers or information on indispensable genomic components (e.g., telomeres, centromeres, or ribosomal RNA genes). Surprisingly, in Allium species, a comprehensive comparative study of repeats is lacking. Here we provide such a study of two economically important species, Allium cepa (onion), and A. sativum (garlic), and their distantly related A. ursinum (wild garlic). We present an overview and classification of major repeats in these species and have paid specific attention to sequence conservation and copy numbers of major representatives in each type of repeat, including retrotransposons, rDNA, or newly identified satellite sequences. Prevailing repeats in all three studied species belonged to Ty3/gypsy elements, however they significantly diverged and we did not detect them in common clusters in comparative analysis. Actually, only a low number of clusters was shared by all three species. Such conserved repeats were for example 5S and 45S rDNA genes and surprisingly a specific and quite rare Ty1/copia lineage. Species-specific long satellites were found mainly in A. cepa and A. sativum. We also show in situ localization of selected repeats that could potentially be applicable as chromosomal markers, e.g., in interspecific breeding.
- Keywords
- Allium, RepeatExplorer, TAREAN, plant genome, rDNA, repeats, retrotransposon, satellite, telomere,
- MeSH
- Allium classification genetics MeSH
- Chromosomes, Plant MeSH
- Genome, Plant * MeSH
- Genomics * methods MeSH
- In Situ Hybridization, Fluorescence MeSH
- Nucleotide Motifs MeSH
- Retroelements MeSH
- DNA, Satellite MeSH
- Tandem Repeat Sequences MeSH
- Telomere MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Retroelements MeSH
- DNA, Satellite MeSH
BACKGROUND: Plant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy. They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Previous attempts to group these families into broader categories reflecting their phylogenetic relationships were limited either to analyzing a narrow range of plant species or to analyzing a small numbers of elements. Furthermore, there is no reference database that allows for similarity based classification of LTR-retrotransposons. RESULTS: We have assembled a database of retrotransposon encoded polyprotein domains sequences extracted from 5410 Ty1/copia elements and 8453 Ty3/gypsy elements sampled from 80 species representing major groups of green plants (Viridiplantae). Phylogenetic analysis of the three most conserved polyprotein domains (RT, RH and INT) led to dividing Ty1/copia and Ty3/gypsy retrotransposons into 16 and 14 lineages respectively. We also characterized various features of LTR-retrotransposon sequences including additional polyprotein domains, extra open reading frames and primer binding sites, and found that the occurrence and/or type of these features correlates with phylogenies inferred from the three protein domains. CONCLUSIONS: We have established an improved classification system applicable to LTR-retrotransposons from a wide range of plant species. This system reflects phylogenetic relationships as well as distinct sequence and structural features of the elements. A comprehensive database of retrotransposon protein domains (REXdb) that reflects this classification provides a reference for efficient and unified annotation of LTR-retrotransposons in plant genomes. Access to REXdb related tools is implemented in the RepeatExplorer web server (https://repeatexplorer-elixir.cerit-sc.cz/) or using a standalone version of REXdb that can be downloaded seaparately from RepeatExplorer web page (http://repeatexplorer.org/).
- Keywords
- LTR-retrotransposons, Polyprotein domains, Primer binding site, RepeatExplorer, Transposable elements,
- Publication type
- Journal Article MeSH
Allopolyploidy has played an important role in the evolution of the flowering plants. Genome mergers are often accompanied by significant and rapid alterations of genome size and structure via chromosomal rearrangements and altered dynamics of tandem and dispersed repetitive DNA families. Recent developments in sequencing technologies and bioinformatic methods allow for a comprehensive investigation of the repetitive component of plant genomes. Interpretation of evolutionary dynamics following allopolyploidization requires both the knowledge of parentage and the age of origin of an allopolyploid. Whereas parentage is typically inferred from cytogenetic and phylogenetic data, age inference is hampered by the reticulate nature of the phylogenetic relationships. Treating subgenomes of allopolyploids as if they belonged to different species (i.e., no recombination among subgenomes) and applying cross-bracing (i.e., putting a constraint on the age difference of nodes pertaining to the same event), we can infer the age of allopolyploids within the framework of the multispecies coalescent within BEAST2. Together with a comprehensive characterization of the repetitive DNA fraction using the RepeatExplorer pipeline, we apply the dating approach in a group of closely related allopolyploids and their progenitor species in the plant genus Melampodium (Asteraceae). We dated the origin of both the allotetraploid, Melampodium strigosum, and its two allohexaploid derivatives, Melampodium pringlei and Melampodium sericeum, which share both parentage and the direction of the cross, to the Pleistocene ($<$1.4 Ma). Thus, Pleistocene climatic fluctuations may have triggered formation of allopolyploids possibly in short intervals, contributing to difficulties in inferring the precise temporal order of allopolyploid species divergence of M. sericeum and M. pringlei. The relatively recent origin of the allopolyploids likely played a role in the near-absence of major changes in the repetitive fraction of the polyploids' genomes. The repetitive elements most affected by the postpolyploidization changes represented retrotransposons of the Ty1-copia lineage Maximus and, to a lesser extent, also Athila elements of Ty3-gypsy family.
- MeSH
- Asteraceae classification genetics MeSH
- DNA, Plant genetics MeSH
- Phylogeny MeSH
- Genome, Plant genetics MeSH
- Evolution, Molecular * MeSH
- Polyploidy MeSH
- Repetitive Sequences, Nucleic Acid genetics MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH