Nejvíce citovaný článek - PubMed ID 19384338
Evolutionary conserved lineage of Angela-family retrotransposons as a genome-wide microsatellite repeat dispersal agent
DNA transposons are defined as repeated DNA sequences that can move within the host genome through the action of transposases. The transposon superfamily Merlin was originally found mainly in animal genomes. Here, we describe a global distribution of the Merlin in animals, fungi, plants and protists, reporting for the first time their presence in Rhodophyceae, Metamonada, Discoba and Alveolata. We identified a great variety of potentially active Merlin families, some containing highly imperfect terminal inverted repeats and internal tandem repeats. Merlin-related sequences with no evidence of mobilization capacity were also observed and may be products of domestication. The evolutionary trees support that Merlin is likely an ancient superfamily, with early events of diversification and secondary losses, although repeated re-invasions probably occurred in some groups, which would explain its diversity and discontinuous distribution. We cannot rule out the possibility that the Merlin superfamily is the product of multiple horizontal transfers of related prokaryotic insertion sequences. Moreover, this is the first account of a DNA transposon in kinetoplastid flagellates, with conserved Merlin transposase identified in Bodo saltans and Perkinsela sp., whereas it is absent in trypanosomatids. Based on the level of conservation of the transposase and overlaps of putative open reading frames with Merlin, we propose that in protists it may serve as a raw material for gene emergence.
- MeSH
- Alveolata genetika MeSH
- Eukaryota genetika MeSH
- fylogeneze MeSH
- Kinetoplastida genetika MeSH
- molekulární evoluce MeSH
- neurofibromin 2 genetika MeSH
- polymerázová řetězová reakce MeSH
- transpozibilní elementy DNA genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- neurofibromin 2 MeSH
- transpozibilní elementy DNA MeSH
Retrotransposable elements are widely distributed and diverse in eukaryotes. Their copy number increases through reverse-transcription-mediated propagation, while they can be lost through recombinational processes, generating genomic rearrangements. We previously identified extensive structurally uniform retrotransposon groups in which no member contains the gag, pol, or env internal domains. Because of the lack of protein-coding capacity, these groups are non-autonomous in replication, even if transcriptionally active. The Cassandra element belongs to the non-autonomous group called terminal-repeat retrotransposons in miniature (TRIM). It carries 5S RNA sequences with conserved RNA polymerase (pol) III promoters and terminators in its long terminal repeats (LTRs). Here, we identified multiple extended tandem arrays of Cassandra retrotransposons within different plant species, including ferns. At least 12 copies of repeated LTRs (as the tandem unit) and internal domain (as a spacer), giving a pattern that resembles the cellular 5S rRNA genes, were identified. A cytogenetic analysis revealed the specific chromosomal pattern of the Cassandra retrotransposon with prominent clustering at and around 5S rDNA loci. The secondary structure of the Cassandra retroelement RNA is predicted to form super-loops, in which the two LTRs are complementary to each other and can initiate local recombination, leading to the tandem arrays of Cassandra elements. The array structures are conserved for Cassandra retroelements of different species. We speculate that recombination events similar to those of 5S rRNA genes may explain the wide variation in Cassandra copy number. Likewise, the organization of 5S rRNA gene sequences is very variable in flowering plants; part of what is taken for 5S gene copy variation may be variation in Cassandra number. The role of the Cassandra 5S sequences remains to be established.
- Klíčová slova
- 5S RNA gene, Cassandra TRIM, ectopic recombination, genome evolution, long tandem array, retrotransposon,
- MeSH
- chromozomy hmyzu MeSH
- fylogeneze MeSH
- genom rostlinný MeSH
- genomika metody MeSH
- interakce hostitele a parazita genetika MeSH
- koncové repetice * MeSH
- konformace nukleové kyseliny MeSH
- molekulární evoluce MeSH
- můry genetika MeSH
- rekombinace genetická MeSH
- retroelementy * MeSH
- RNA ribozomální 5S genetika MeSH
- rostliny genetika parazitologie MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- retroelementy * MeSH
- RNA ribozomální 5S MeSH
The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
- MeSH
- délka genomu * MeSH
- Fabaceae klasifikace genetika MeSH
- fylogeneze MeSH
- genetická variace * MeSH
- genom rostlinný * MeSH
- genomika * metody MeSH
- koncové repetice MeSH
- molekulární evoluce MeSH
- repetitivní sekvence nukleových kyselin * MeSH
- reprodukovatelnost výsledků MeSH
- sekvenční analýza DNA MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Microsatellites, or simple sequence repeats (SSRs) are widespread class of repetitive DNA sequences, used in population genetics, genetic diversity and mapping studies. In spite of the SSR utility, the genetic and evolutionary mechanisms are not fully understood. We have investigated three microsatellite loci with different position in the pea (Pisum sativum L.) genome, the A9 locus residing in LTR region of abundant retrotransposon, AD270 as intergenic and AF016458 located in 5'untranslated region of expressed gene. Comparative analysis of a 35 pair samples from seven pea varieties propagated by single-seed descent for ten generations, revealed single 4 bp mutation in 10th generation sample at AD270 locus corresponding to stepwise increase in one additional ATCT repeat unit. The estimated mutation rate was 4.76 × 10(-3) per locus per generation, with a 95% confidence interval of 1.2 × 10(-4) to 2.7 × 10(-2). The comparison of cv. Bohatýr accessions retrieved from different collections, showed intra-, inter-accession variation and differences in flanking and repeat sequences. Fragment size and sequence alternations were also found in long term in vitro organogenic culture, established at 1983, indicative of somatic mutation process. The evidence of homoplasy was detected across of unrelated pea genotypes, which adversaly affects the reliability of diversity estimates not only for diverse germplasm but also highly bred material. The findings of this study have important implications for Pisum phylogeny studies, variety identification and registration process in pea breeding where mutation rate influences the genetic diversity and the effective population size estimates.
- MeSH
- 5' nepřekládaná oblast MeSH
- chov MeSH
- DNA rostlinná genetika izolace a purifikace MeSH
- genetická variace MeSH
- genom rostlinný MeSH
- genotyp MeSH
- hrách setý genetika MeSH
- intergenová DNA MeSH
- koncové repetice MeSH
- mikrosatelitní repetice * MeSH
- mutace * MeSH
- nestabilita genomu MeSH
- retroelementy MeSH
- rodokmen * MeSH
- sekvenční analýza DNA MeSH
- semena rostlinná genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- 5' nepřekládaná oblast MeSH
- DNA rostlinná MeSH
- intergenová DNA MeSH
- retroelementy MeSH
BACKGROUND: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. RESULTS: We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. CONCLUSIONS: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.
- MeSH
- DNA rostlinná genetika MeSH
- genom rostlinný MeSH
- Glycine max genetika MeSH
- hrách setý genetika MeSH
- mapování chromozomů MeSH
- repetitivní sekvence nukleových kyselin * MeSH
- sekvenční analýza DNA * MeSH
- shluková analýza MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- DNA rostlinná MeSH