Nejvíce citovaný článek - PubMed ID 26061193
Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size
Repetitive DNA contributes significantly to plant genome size, adaptation, and evolution. However, little is understood about the transcription of repeats. This is addressed here in the plant green foxtail millet (Setaria viridis). First, we used RepeatExplorer2 to calculate the genome proportion (GP) of all repeat types and compared the GP of long terminal repeat (LTR) retroelements against annotated complete and incomplete LTR retroelements (Ty1/copia and Ty3/gypsy) identified by DANTE in a whole genome assembly. We show that DANTE-identified LTR retroelements can comprise ∼0.75% of the inflorescence poly-A transcriptome and ∼0.24% of the stem ribo-depleted transcriptome. In the RNA libraries from inflorescence tissue, both LTR retroelements and DNA transposons identified by RepeatExplorer2 were highly abundant, where they may be taking advantage of the reduced epigenetic silencing in the germ line to amplify. Typically, there was a higher representation of DANTE-identified LTR retroelements in the transcriptome than RepeatExplorer2-identified LTR retroelements, potentially reflecting the transcription of elements that have insufficient genomic copy numbers to be detected by RepeatExplorer2. In contrast, for ribo-depleted libraries of stem tissues, the reverse was observed, with a higher transcriptome representation of RepeatExplorer2-identified LTR retroelements. For RepeatExplorer2-identified repeats, we show that the GP of most Ty1/copia and Ty3/gypsy families were positively correlated with their transcript proportion. In addition, guanine- and cytosine-rich repeats with high sequence similarity were also the most abundant in the transcriptome, and these likely represent young elements that are most capable of amplification due to their ability to evade epigenetic silencing.
Long terminal repeat (LTR) retrotransposons constitute a predominant class of repetitive DNA elements in most plant genomes. With the increasing number of sequenced plant genomes, there is an ongoing demand for computational tools facilitating efficient annotation and classification of LTR retrotransposons in plant genome assemblies. Herein, we introduce DANTE, a computational pipeline for Domain-based ANnotation of Transposable Elements, designed for sensitive detection of these elements via their conserved protein domain sequences. The identified protein domains are subsequently inputted into the DANTE_LTR pipeline to annotate complete element sequences by detecting their structural features, such as LTRs, in adjacent genomic regions. Leveraging domain sequences allows for precise classification of elements into phylogenetic lineages, offering a more granular annotation compared with coarser conventional superfamily-based classification methods. The efficiency and accuracy of this approach were evidenced via annotation of LTR retrotransposons in 93 plant genomes. Results were benchmarked against several established pipelines, showing that DANTE_LTR is capable of identifying significantly more intact LTR retrotransposons. DANTE and DANTE_LTR are provided as user-friendly Galaxy tools accessible via a public server (https://repeatexplorer-elixir.cerit-sc.cz), installable on local Galaxy instances from the Galaxy tool shed or executable from the command line.
- Publikační typ
- časopisecké články MeSH
Genome size varies 2400-fold across plants, influencing their evolution through changes in cell size and cell division rates which impact plants' environmental stress tolerance. Repetitive element expansion explains much genome size diversity, and the processes structuring repeat 'communities' are analogous to those structuring ecological communities. However, which environmental stressors influence repeat community dynamics has not yet been examined from an ecological perspective. We measured genome size and leveraged climatic data for 91% of genera within the ecologically diverse palm family (Arecaceae). We then generated genomic repeat profiles for 141 palm species, and analysed repeats using phylogenetically informed linear models to explore relationships between repeat dynamics and environmental factors. We show that palm genome size and repeat 'community' composition are best explained by aridity. Specifically, Ty3-gypsy and TIR elements were more abundant in palm species from wetter environments, which generally had larger genomes, suggesting amplification. By contrast, Ty1-copia and LINE elements were more abundant in drier environments. Our results suggest that water stress inhibits repeat expansion through selection on upper genome size limits. However, elements that may associate with stress-response genes (e.g. Ty1-copia) have amplified in arid-adapted palm species. Overall, we provide novel evidence of climate influencing the assembly of repeat 'communities'.
- Klíčová slova
- Arecaceae (palms), adaptation, ecology, genome size, phylogenetic regression, plant evolution, trait evolution, transposable elements,
- MeSH
- Arecaceae * genetika MeSH
- délka genomu MeSH
- fylogeneze MeSH
- genom rostlinný MeSH
- molekulární evoluce MeSH
- retroelementy * MeSH
- sekvenční analýza DNA MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- retroelementy * MeSH
To provide insights into the fate of transposable elements (TEs) across timescales in a post-polyploidization context, we comparatively investigate five sibling Dactylorhiza allotetraploids (Orchidaceae) formed independently and sequentially between 500 and 100K generations ago by unidirectional hybridization between diploids D. fuchsii and D. incarnata. Our results first reveal that the paternal D. incarnata genome shows a marked increased content of LTR retrotransposons compared to the maternal species, reflected in its larger genome size and consistent with a previously hypothesized bottleneck. With regard to the allopolyploids, in the youngest D. purpurella both genome size and TE composition appear to be largely additive with respect to parents, whereas for polyploids of intermediate ages we uncover rampant genome expansion on a magnitude of multiple entire genomes of some plants such as Arabidopsis. The oldest allopolyploids in the series are not larger than the intermediate ones. A putative tandem repeat, potentially derived from a non-autonomous miniature inverted-repeat TE (MITE) drives much of the genome dynamics in the allopolyploids. The highly dynamic MITE-like element is found in higher proportions in the maternal diploid, D. fuchsii, but is observed to increase in copy number in both subgenomes of the allopolyploids. Altogether, the fate of repeats appears strongly regulated and therefore predictable across multiple independent allopolyploidization events in this system. Apart from the MITE-like element, we consistently document a mild genomic shock following the allopolyploidizations investigated here, which may be linked to their relatively large genome sizes, possibly associated with strong selection against further genome expansions.
- Klíčová slova
- allopolyploidy, genome size, genomic shock, marsh orchids, transposable elements,
- MeSH
- diploidie MeSH
- genom rostlinný MeSH
- lidé MeSH
- mokřady MeSH
- Orchidaceae * genetika MeSH
- polyploidie MeSH
- sourozenci * MeSH
- transpozibilní elementy DNA genetika MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- transpozibilní elementy DNA MeSH
Plant genomes are highly diverse in size and repetitive DNA composition. In the absence of polyploidy, the dynamics of repetitive elements, which make up the bulk of the genome in many species, are the main drivers underpinning changes in genome size and the overall evolution of the genomic landscape. The advent of high-throughput sequencing technologies has enabled investigation of genome evolutionary dynamics beyond model plants to provide exciting new insights in species across the biodiversity of life. Here we analyze the evolution of repetitive DNA in two closely related species of Heloniopsis (Melanthiaceae), which despite having the same chromosome number differ nearly twofold in genome size [i.e., H. umbellata (1C = 4,680 Mb), and H. koreana (1C = 2,480 Mb)]. Low-coverage genome skimming and the RepeatExplorer2 pipeline were used to identify the main repeat families responsible for the significant differences in genome sizes. Patterns of repeat evolution were found to correlate with genome size with the main classes of transposable elements identified being twice as abundant in the larger genome of H. umbellata compared with H. koreana. In addition, among the satellite DNA families recovered, a single shared satellite (HeloSAT) was shown to have contributed significantly to the genome expansion of H. umbellata. Evolutionary changes in repetitive DNA composition and genome size indicate that the differences in genome size between these species have been underpinned by the activity of several distinct repeat lineages.
- Klíčová slova
- C-value, DNA repeats, chromosome, satellite DNA, transposable elements,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Cultivated grasses are an important source of food for domestic animals worldwide. Increased knowledge of their genomes can speed up the development of new cultivars with better quality and greater resistance to biotic and abiotic stresses. The most widely grown grasses are tetraploid ryegrass species (Lolium) and diploid and hexaploid fescue species (Festuca). In this work, we characterized repetitive DNA sequences and their contribution to genome size in five fescue and two ryegrass species as well as one fescue and two ryegrass cultivars. RESULTS: Partial genome sequences produced by Illumina sequencing technology were used for genome-wide comparative analyses with the RepeatExplorer pipeline. Retrotransposons were the most abundant repeat type in all seven grass species. The Athila element of the Ty3/gypsy family showed the most striking differences in copy number between fescues and ryegrasses. The sequence data enabled the assembly of the long terminal repeat (LTR) element Fesreba, which is highly enriched in centromeric and (peri)centromeric regions in all species. A combination of fluorescence in situ hybridization (FISH) with a probe specific to the Fesreba element and immunostaining with centromeric histone H3 (CENH3) antibody showed their co-localization and indicated a possible role of Fesreba in centromere function. CONCLUSIONS: Comparative repeatome analyses in a set of fescues and ryegrasses provided new insights into their genome organization and divergence, including the assembly of the LTR element Fesreba. A new LTR element Fesreba was identified and found in abundance in centromeric regions of the fescues and ryegrasses. It may play a role in the function of their centromeres.
- Klíčová slova
- Centromere organization, Festuca, Illumina sequencing, Lolium, Repetitive DNA,
- MeSH
- centromera genetika MeSH
- chromozomy rostlin * MeSH
- Festuca genetika MeSH
- genom rostlinný genetika MeSH
- jílek genetika MeSH
- repetitivní sekvence nukleových kyselin * MeSH
- Publikační typ
- časopisecké články MeSH
Plant genomes vary greatly in composition and size mainly due to the diversity of repetitive DNAs and the inherent propensity for their amplification and removal from the host genome. Most studies addressing repeatome dynamics focus on model organisms, whereas few provide comprehensive investigations across the genomes of related taxa. Herein, we analyze the evolution of repeats of the 13 species in Melampodium sect. Melampodium, representing all but two of its diploid taxa, in a phylogenetic context. The investigated genomes range in size from 0.49 to 2.27 pg/1C (ca. 4.5-fold variation), despite having the same base chromosome number (x = 10) and very strong phylogenetic affinities. Phylogenetic analysis performed in BEAST and ancestral genome size reconstruction revealed mixed patterns of genome size increases and decreases across the group. High-throughput genome skimming and the RepeatExplorer pipeline were utilized to determine the repeat families responsible for the differences in observed genome sizes. Patterns of repeat evolution were found to be highly correlated with phylogenetic position, namely taxonomic series circumscription. Major differences found were in the abundances of the SIRE (Ty1-copia), Athila (Ty3-gypsy), and CACTA (DNA transposon) lineages. Additionally, several satellite DNA families were found to be highly group-specific, although their overall contribution to genome size variation was relatively small. Evolutionary changes in repetitive DNA composition and genome size were complex, with independent patterns of genome up- and downsizing throughout the evolution of the analyzed diploids. A model-based analysis of genome size and repetitive DNA composition revealed evidence for strong phylogenetic signal and differential evolutionary rates of major lineages of repeats in the diploid genomes.
- Klíčová slova
- Bayesian analysis, Melampodium, ancestral state reconstruction, genome size, phylogenetics, repetitive DNA, tandem repeats, transposable elements,
- Publikační typ
- časopisecké články MeSH
Knowledge of the fascinating world of DNA repeats is continuously being enriched by newly identified elements and their hypothetical or well-established biological relevance. Genomic approaches can be used for comparative studies of major repeats in any group of genomes, regardless of their size and complexity. Such studies are particularly fruitful in large genomes, and useful mainly in crop plants where they provide a rich source of molecular markers or information on indispensable genomic components (e.g., telomeres, centromeres, or ribosomal RNA genes). Surprisingly, in Allium species, a comprehensive comparative study of repeats is lacking. Here we provide such a study of two economically important species, Allium cepa (onion), and A. sativum (garlic), and their distantly related A. ursinum (wild garlic). We present an overview and classification of major repeats in these species and have paid specific attention to sequence conservation and copy numbers of major representatives in each type of repeat, including retrotransposons, rDNA, or newly identified satellite sequences. Prevailing repeats in all three studied species belonged to Ty3/gypsy elements, however they significantly diverged and we did not detect them in common clusters in comparative analysis. Actually, only a low number of clusters was shared by all three species. Such conserved repeats were for example 5S and 45S rDNA genes and surprisingly a specific and quite rare Ty1/copia lineage. Species-specific long satellites were found mainly in A. cepa and A. sativum. We also show in situ localization of selected repeats that could potentially be applicable as chromosomal markers, e.g., in interspecific breeding.
- Klíčová slova
- Allium, RepeatExplorer, TAREAN, plant genome, rDNA, repeats, retrotransposon, satellite, telomere,
- MeSH
- Allium klasifikace genetika MeSH
- chromozomy rostlin MeSH
- genom rostlinný * MeSH
- genomika * metody MeSH
- hybridizace in situ fluorescenční MeSH
- nukleotidové motivy MeSH
- retroelementy MeSH
- satelitní DNA MeSH
- tandemové repetitivní sekvence MeSH
- telomery MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- retroelementy MeSH
- satelitní DNA MeSH
BACKGROUND: Plant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy. They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Previous attempts to group these families into broader categories reflecting their phylogenetic relationships were limited either to analyzing a narrow range of plant species or to analyzing a small numbers of elements. Furthermore, there is no reference database that allows for similarity based classification of LTR-retrotransposons. RESULTS: We have assembled a database of retrotransposon encoded polyprotein domains sequences extracted from 5410 Ty1/copia elements and 8453 Ty3/gypsy elements sampled from 80 species representing major groups of green plants (Viridiplantae). Phylogenetic analysis of the three most conserved polyprotein domains (RT, RH and INT) led to dividing Ty1/copia and Ty3/gypsy retrotransposons into 16 and 14 lineages respectively. We also characterized various features of LTR-retrotransposon sequences including additional polyprotein domains, extra open reading frames and primer binding sites, and found that the occurrence and/or type of these features correlates with phylogenies inferred from the three protein domains. CONCLUSIONS: We have established an improved classification system applicable to LTR-retrotransposons from a wide range of plant species. This system reflects phylogenetic relationships as well as distinct sequence and structural features of the elements. A comprehensive database of retrotransposon protein domains (REXdb) that reflects this classification provides a reference for efficient and unified annotation of LTR-retrotransposons in plant genomes. Access to REXdb related tools is implemented in the RepeatExplorer web server (https://repeatexplorer-elixir.cerit-sc.cz/) or using a standalone version of REXdb that can be downloaded seaparately from RepeatExplorer web page (http://repeatexplorer.org/).
- Klíčová slova
- LTR-retrotransposons, Polyprotein domains, Primer binding site, RepeatExplorer, Transposable elements,
- Publikační typ
- časopisecké články MeSH
Allopolyploidy has played an important role in the evolution of the flowering plants. Genome mergers are often accompanied by significant and rapid alterations of genome size and structure via chromosomal rearrangements and altered dynamics of tandem and dispersed repetitive DNA families. Recent developments in sequencing technologies and bioinformatic methods allow for a comprehensive investigation of the repetitive component of plant genomes. Interpretation of evolutionary dynamics following allopolyploidization requires both the knowledge of parentage and the age of origin of an allopolyploid. Whereas parentage is typically inferred from cytogenetic and phylogenetic data, age inference is hampered by the reticulate nature of the phylogenetic relationships. Treating subgenomes of allopolyploids as if they belonged to different species (i.e., no recombination among subgenomes) and applying cross-bracing (i.e., putting a constraint on the age difference of nodes pertaining to the same event), we can infer the age of allopolyploids within the framework of the multispecies coalescent within BEAST2. Together with a comprehensive characterization of the repetitive DNA fraction using the RepeatExplorer pipeline, we apply the dating approach in a group of closely related allopolyploids and their progenitor species in the plant genus Melampodium (Asteraceae). We dated the origin of both the allotetraploid, Melampodium strigosum, and its two allohexaploid derivatives, Melampodium pringlei and Melampodium sericeum, which share both parentage and the direction of the cross, to the Pleistocene ($<$1.4 Ma). Thus, Pleistocene climatic fluctuations may have triggered formation of allopolyploids possibly in short intervals, contributing to difficulties in inferring the precise temporal order of allopolyploid species divergence of M. sericeum and M. pringlei. The relatively recent origin of the allopolyploids likely played a role in the near-absence of major changes in the repetitive fraction of the polyploids' genomes. The repetitive elements most affected by the postpolyploidization changes represented retrotransposons of the Ty1-copia lineage Maximus and, to a lesser extent, also Athila elements of Ty3-gypsy family.
- MeSH
- Asteraceae klasifikace genetika MeSH
- DNA rostlinná genetika MeSH
- fylogeneze MeSH
- genom rostlinný genetika MeSH
- molekulární evoluce * MeSH
- polyploidie MeSH
- repetitivní sekvence nukleových kyselin genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- DNA rostlinná MeSH