Most cited article - PubMed ID 25290698
Bursts of transposable elements as an evolutionary driving force
BACKGROUND: The centromere is one of the key regions of the eukaryotic chromosome. While maintaining its function, centromeric DNA may differ among closely related species. Here, we explored the composition and structure of the pericentromeres (a chromosomal region including a functional centromere) of Hieracium alpinum (Asteraceae), a member of one of the most diverse genera in the plant kingdom. Previously, we identified a pericentromere-specific tandem repeat that made it possible to distinguish reads within the Oxford Nanopore library attributed to the pericentromeres, separating them into a discrete subset and allowing comparison of the repeatome composition of this subset with the remaining genome. RESULTS: We found that the main satellite DNA (satDNA) monomer forms long arrays of linear and block types in the pericentromeric heterochromatin of H. alpinum, and very often, single reads contain forward and reverse arrays and mirror each other. Beside the major, two new minor satDNA families were discovered. In addition to satDNAs, high amounts of LTR retrotransposons (TEs) with dominant of Tekay lineage, were detected in the pericentromeres. We were able to reconstruct four main TEs of the Ty3-gypsy and Ty1-copia superfamilies and compare their relative positions with satDNAs. The latter showed that the conserved domains (CDs) of the TE proteins are located between the newly discovered satDNAs, which appear to be parts of ancient Tekay LTRs that we were able to reconstruct. The dominant satDNA monomer shows a certain similarity to the GAG CD of the Angela retrotransposon. CONCLUSIONS: The species-specific pericentromeric arrays of the H. alpinum genome are heterogeneous, exhibiting both linear and block type structures. High amounts of forward and reverse arrays of the main satDNA monomer point to multiple microinversions that could be the main mechanism for rapid structural evolution stochastically creating the uniqueness of an individual pericentromeric structure. The traces of TEs insertion waves remain in pericentromeres for a long time, thus "keeping memories" of past genomic events. We counted at least four waves of TEs insertions. In pericentromeres, TEs particles can be transformed into satDNA, which constitutes a background pool of minor families that, under certain conditions, can replace the dominant one(s).
- Keywords
- Asteraceae, Hieracium, Oxford Nanopore Technology sequencing, Pericentromeres, Plants, Satellite DNA, Transposable elements,
- Publication type
- Journal Article MeSH
Non-coding repetitive DNA (repeatome) is an active part of the nuclear genome, involved in its structure, evolution and function. It is dominated by transposable elements (TEs) and satellite DNA and is prone to the most rapid changes over time. The TEs activity presumably causes the global genome reorganization and may play an adaptive or regulatory role in response to environmental challenges. This assumption is applied here for the first time to plants from the Cape Floristic hotspot to determine whether changes in repetitive DNA are related to responses to a harsh, but extremely species-rich environment. The genus Pteronia (Asteraceae) serves as a suitable model group because it shows considerable variation in genome size at the diploid level and has high and nearly equal levels of endemism in the two main Cape biomes, Fynbos and Succulent Karoo. First, we constructed a phylogeny based on multiple low-copy genes that served as a phylogenetic framework for detecting quantitative and qualitative changes in the repeatome. Second, we performed a comparative analysis of the environments of two groups of Pteronia differing in their TEs bursts. Our results suggest that the environmental transition from the Succulent Karoo to the Fynbos is accompanied by TEs burst, which is likely also driving phylogenetic divergence. We thus hypothesize that analysis of rapidly evolving repeatome could serve as an important proxy for determining the molecular basis of lineage divergence in rapidly radiating groups.
- Keywords
- Greater Cape Floristic Region (GCFR), HybSeq, Pteronia, genome size, niche modelling, repeatome,
- Publication type
- Journal Article MeSH
Plant genomes consist, to a considerable extent, of non-coding repetitive DNA. Several studies showed that phylogenetic signals can be extracted from such repeatome data by using among-species dissimilarities from the RepeatExplorer2 pipeline as distance measures. Here, we advanced this approach by adjusting the read input for comparative clustering indirectly proportional to genome size and by summarizing all clusters into a main distance matrix subjected to Neighbor Joining algorithms and Principal Coordinate Analyses. Thus, our multivariate statistical method works as a "repeatomic fingerprint," and we proved its power and limitations by exemplarily applying it to the family Rosaceae at intrafamilial and, in the genera Fragaria and Rosa, at the intrageneric level. Since both taxa are prone to hybridization events, we wanted to show whether repeatome data are suitable to unravel the origin of natural and synthetic hybrids. In addition, we compared the results based on complete repeatomes with those from ribosomal DNA clusters only, because they represent one of the most widely used barcoding markers. Our results demonstrated that repeatome data contained a clear phylogenetic signal supporting the current subfamilial classification within Rosaceae. Accordingly, the well-accepted major evolutionary lineages within Fragaria were distinguished, and hybrids showed intermediate positions between parental species in data sets retrieved from both complete repeatomes and rDNA clusters. Within the taxonomically more complicated and particularly frequently hybridizing genus Rosa, we detected rather weak phylogenetic signals but surprisingly found a geographic pattern at a population scale. In sum, our method revealed promising results at larger taxonomic scales as well as within taxa with manageable levels of reticulation, but success remained rather taxon specific. Since repeatomes can be technically easy and comparably inexpensively retrieved even from samples of rather poor DNA quality, our phylogenomic method serves as a valuable alternative when high-quality genomes are unavailable, for example, in the case of old museum specimens.
- Keywords
- Caninae, Fragaria, Rosaceae, graph-based clustering, high-throughput sequencing, phylogenetics, repeatome, repetitive DNA,
- Publication type
- Journal Article MeSH
The repetitive content of the plant genome (repeatome) often represents its largest fraction and is frequently correlated with its size. Transposable elements (TEs), the main component of the repeatome, are an important driver in the genome diversification due to their fast-evolving nature. Hybridization and polyploidization events are hypothesized to induce massive bursts of TEs resulting, among other effects, in an increase of copy number and genome size. Little is known about the repeatome dynamics following hybridization and polyploidization in plants that reproduce by apomixis (asexual reproduction via seeds). To address this, we analyzed the repeatomes of two diploid parental species, Hieracium intybaceum and H. prenanthoides (sexual), their diploid F1 synthetic and their natural triploid hybrids (H. pallidiflorum and H. picroides, apomictic). Using low-coverage next-generation sequencing (NGS) and a graph-based clustering approach, we detected high overall similarity across all major repeatome categories between the parental species, despite their large phylogenetic distance. Medium and highly abundant repetitive elements comprise ∼70% of Hieracium genomes; most prevalent were Ty3/Gypsy chromovirus Tekay and Ty1/Copia Maximus-SIRE elements. No TE bursts were detected, neither in synthetic nor in natural hybrids, as TE abundance generally followed theoretical expectations based on parental genome dosage. Slight over- and under-representation of TE cluster abundances reflected individual differences in genome size. However, in comparative analyses, apomicts displayed an overabundance of pararetrovirus clusters not observed in synthetic hybrids. Substantial deviations were detected in rDNAs and satellite repeats, but these patterns were sample specific. rDNA and satellite repeats (three of them were newly developed as cytogenetic markers) were localized on chromosomes by fluorescence in situ hybridization (FISH). In a few cases, low-abundant repeats (5S rDNA and certain satellites) showed some discrepancy between NGS data and FISH results, which is due partly to the bias of low-coverage sequencing and partly to low amounts of the satellite repeats or their sequence divergence. Overall, satellite DNA (including rDNA) was markedly affected by hybridization, but independent of the ploidy or reproductive mode of the progeny, whereas bursts of TEs did not play an important role in the evolutionary history of Hieracium.
- Keywords
- RepeatExplorer, apomixis, hawkweed, hybridization, next-generation sequencing, polyploidization, repeatome,
- Publication type
- Journal Article MeSH
Satellite DNA (satDNA) is one of the major fractions of the eukaryotic nuclear genome. Highly variable satDNA is involved in various genome functions, and a clear link between satellites and phenotypes exists in a wide range of organisms. However, little is known about the origin and temporal dynamics of satDNA. The "library hypothesis" indicates that the rapid evolutionary changes experienced by satDNAs are mostly quantitative. Although this hypothesis has received some confirmation, a number of its aspects are still controversial. A recently developed next-generation sequencing (NGS) method allows the determination of the satDNA landscape and could shed light on unresolved issues. Here, we explore low-coverage NGS data to infer satDNA evolution in the phylogenetic context of the diploid species of the Chenopodium album aggregate. The application of the Illumina read assembly algorithm in combination with Oxford Nanopore sequencing and fluorescent in situ hybridization allowed the estimation of eight satDNA families within the studied group, six of which were newly described. The obtained set of satDNA families of different origins can be divided into several categories, namely group-specific, lineage-specific and species-specific. In the process of evolution, satDNA families can be transmitted vertically and can be eliminated over time. Moreover, transposable element-derived satDNA families may appear repeatedly in the satellitome, creating an illusion of family conservation. Thus, the obtained data refute the "library hypothesis", rather than confirming it, and in our opinion, it is more appropriate to speak about "the library of the mechanisms of origin".
- MeSH
- Chenopodium album genetics growth & development MeSH
- Diploidy * MeSH
- DNA, Plant analysis genetics MeSH
- Species Specificity MeSH
- Phylogeny MeSH
- Genome, Plant * MeSH
- Gene Library MeSH
- Evolution, Molecular * MeSH
- DNA, Satellite analysis genetics MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH
Extensive and complex links exist between transposable elements (TEs) and satellite DNA (satDNA), which are the two largest fractions of eukaryotic genome. These relationships have a crucial effect on genome structure, function and evolution. Here, we report a novel case of mutual relationships between TEs and satDNA. In the genomes of Chenopodium s. str. species, the deletion derivatives of tnp2 conserved domain of the newly discovered CACTA-like TE Jozin are involved in generating monomers of the most abundant satDNA family of the Chenopodium satellitome. The analysis of the relative positions of satDNA and different TEs utilizing assembled Illumina reads revealed several associations between satDNA arrays and the transposases of putative CACTA-like elements when an ~ 40 bp fragment of tnp2 served as the start monomer of the satDNA array. The high degree of identity of the consensus satDNA monomers of the investigated species and the tnp2 fragment (from 82.1 to 94.9%) provides evidence of the genesis of CficCl-61-40 satDNA family monomers from analogous regions of their respective parental elements. The results were confirmed via molecular genetic methods and Oxford Nanopore sequencing. The discovered phenomenon leads to the continuous replenishment of species genomes with new identical satDNA monomers, which in turn may increase species satellitomes similarity.
- Keywords
- CACTA transposons, Chenopodium, Next generation sequencing, Oxford Nanopore sequencing, Satellite DNA, Transposase,
- Publication type
- Journal Article MeSH
Retrotransposable elements are widely distributed and diverse in eukaryotes. Their copy number increases through reverse-transcription-mediated propagation, while they can be lost through recombinational processes, generating genomic rearrangements. We previously identified extensive structurally uniform retrotransposon groups in which no member contains the gag, pol, or env internal domains. Because of the lack of protein-coding capacity, these groups are non-autonomous in replication, even if transcriptionally active. The Cassandra element belongs to the non-autonomous group called terminal-repeat retrotransposons in miniature (TRIM). It carries 5S RNA sequences with conserved RNA polymerase (pol) III promoters and terminators in its long terminal repeats (LTRs). Here, we identified multiple extended tandem arrays of Cassandra retrotransposons within different plant species, including ferns. At least 12 copies of repeated LTRs (as the tandem unit) and internal domain (as a spacer), giving a pattern that resembles the cellular 5S rRNA genes, were identified. A cytogenetic analysis revealed the specific chromosomal pattern of the Cassandra retrotransposon with prominent clustering at and around 5S rDNA loci. The secondary structure of the Cassandra retroelement RNA is predicted to form super-loops, in which the two LTRs are complementary to each other and can initiate local recombination, leading to the tandem arrays of Cassandra elements. The array structures are conserved for Cassandra retroelements of different species. We speculate that recombination events similar to those of 5S rRNA genes may explain the wide variation in Cassandra copy number. Likewise, the organization of 5S rRNA gene sequences is very variable in flowering plants; part of what is taken for 5S gene copy variation may be variation in Cassandra number. The role of the Cassandra 5S sequences remains to be established.
- Keywords
- 5S RNA gene, Cassandra TRIM, ectopic recombination, genome evolution, long tandem array, retrotransposon,
- MeSH
- Chromosomes, Insect MeSH
- Phylogeny MeSH
- Genome, Plant MeSH
- Genomics methods MeSH
- Host-Parasite Interactions genetics MeSH
- Terminal Repeat Sequences * MeSH
- Nucleic Acid Conformation MeSH
- Evolution, Molecular MeSH
- Moths genetics MeSH
- Recombination, Genetic MeSH
- Retroelements * MeSH
- RNA, Ribosomal, 5S genetics MeSH
- Plants genetics parasitology MeSH
- Animals MeSH
- Check Tag
- Animals MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Retroelements * MeSH
- RNA, Ribosomal, 5S MeSH
Plant genomes vary greatly in composition and size mainly due to the diversity of repetitive DNAs and the inherent propensity for their amplification and removal from the host genome. Most studies addressing repeatome dynamics focus on model organisms, whereas few provide comprehensive investigations across the genomes of related taxa. Herein, we analyze the evolution of repeats of the 13 species in Melampodium sect. Melampodium, representing all but two of its diploid taxa, in a phylogenetic context. The investigated genomes range in size from 0.49 to 2.27 pg/1C (ca. 4.5-fold variation), despite having the same base chromosome number (x = 10) and very strong phylogenetic affinities. Phylogenetic analysis performed in BEAST and ancestral genome size reconstruction revealed mixed patterns of genome size increases and decreases across the group. High-throughput genome skimming and the RepeatExplorer pipeline were utilized to determine the repeat families responsible for the differences in observed genome sizes. Patterns of repeat evolution were found to be highly correlated with phylogenetic position, namely taxonomic series circumscription. Major differences found were in the abundances of the SIRE (Ty1-copia), Athila (Ty3-gypsy), and CACTA (DNA transposon) lineages. Additionally, several satellite DNA families were found to be highly group-specific, although their overall contribution to genome size variation was relatively small. Evolutionary changes in repetitive DNA composition and genome size were complex, with independent patterns of genome up- and downsizing throughout the evolution of the analyzed diploids. A model-based analysis of genome size and repetitive DNA composition revealed evidence for strong phylogenetic signal and differential evolutionary rates of major lineages of repeats in the diploid genomes.
- Keywords
- Bayesian analysis, Melampodium, ancestral state reconstruction, genome size, phylogenetics, repetitive DNA, tandem repeats, transposable elements,
- Publication type
- Journal Article MeSH
Satellite DNA (satDNA) is the most variable fraction of the eukaryotic genome. Related species share a common ancestral satDNA library and changing of any library component in a particular lineage results in interspecific differences. Although the general developmental trend is clear, our knowledge of the origin and dynamics of satDNAs is still fragmentary. Here, we explore whole genome shotgun Illumina reads using the RepeatExplorer (RE) pipeline to infer satDNA family life stories in the genomes of Chenopodium species. The seven diploids studied represent separate lineages and provide an example of a species complex typical for angiosperms. Application of the RE pipeline allowed by similarity searches a determination of the satDNA family with a basic monomer of ~40 bp and to trace its transformation from the reconstructed ancestral to the species-specific sequences. As a result, three types of satDNA family evolutionary development were distinguished: (i) concerted evolution with mutation and recombination events; (ii) concerted evolution with a trend toward increased complexity and length of the satellite monomer; and (iii) non-concerted evolution, with low levels of homogenization and multidirectional trends. The third type is an example of entire repeatome transformation, thus producing a novel set of satDNA families, and genomes showing non-concerted evolution are proposed as a significant source for genomic diversity.
- Keywords
- genome evolution, high order repeats, next-generation sequencing, plants, satellite DNA,
- MeSH
- Chenopodium genetics MeSH
- Diploidy MeSH
- DNA, Plant genetics MeSH
- Species Specificity MeSH
- Phylogeny MeSH
- Genome, Plant MeSH
- Genome Components MeSH
- Evolution, Molecular MeSH
- DNA, Satellite genetics MeSH
- Sequence Analysis, DNA MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- DNA, Plant MeSH
- DNA, Satellite MeSH