RepeatExplorer
Dotaz
Zobrazit nápovědu
MOTIVATION: Repetitive DNA makes up large portions of plant and animal nuclear genomes, yet it remains the least-characterized genome component in most species studied so far. Although the recent availability of high-throughput sequencing data provides necessary resources for in-depth investigation of genomic repeats, its utility is hampered by the lack of specialized bioinformatics tools and appropriate computational resources that would enable large-scale repeat analysis to be run by biologically oriented researchers. RESULTS: Here we present RepeatExplorer, a collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads. Additional tools are provided to aid in classification of identified repeats, investigate phylogenetic relationships of retroelements and perform comparative analysis of repeat composition between multiple species. The server allows to analyze several million sequence reads, which typically results in identification of most high and medium copy repeats in higher plant genomes.
- MeSH
- algoritmy MeSH
- DNA chemie MeSH
- Eukaryota genetika MeSH
- fylogeneze MeSH
- genom MeSH
- internet MeSH
- repetitivní sekvence nukleových kyselin * MeSH
- sekvenční analýza DNA * MeSH
- shluková analýza MeSH
- software * MeSH
- vysoce účinné nukleotidové sekvenování * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- DNA MeSH
BACKGROUND: Plant LTR-retrotransposons are classified into two superfamilies, Ty1/copia and Ty3/gypsy. They are further divided into an enormous number of families which are, due to the high diversity of their nucleotide sequences, usually specific to a single or a group of closely related species. Previous attempts to group these families into broader categories reflecting their phylogenetic relationships were limited either to analyzing a narrow range of plant species or to analyzing a small numbers of elements. Furthermore, there is no reference database that allows for similarity based classification of LTR-retrotransposons. RESULTS: We have assembled a database of retrotransposon encoded polyprotein domains sequences extracted from 5410 Ty1/copia elements and 8453 Ty3/gypsy elements sampled from 80 species representing major groups of green plants (Viridiplantae). Phylogenetic analysis of the three most conserved polyprotein domains (RT, RH and INT) led to dividing Ty1/copia and Ty3/gypsy retrotransposons into 16 and 14 lineages respectively. We also characterized various features of LTR-retrotransposon sequences including additional polyprotein domains, extra open reading frames and primer binding sites, and found that the occurrence and/or type of these features correlates with phylogenies inferred from the three protein domains. CONCLUSIONS: We have established an improved classification system applicable to LTR-retrotransposons from a wide range of plant species. This system reflects phylogenetic relationships as well as distinct sequence and structural features of the elements. A comprehensive database of retrotransposon protein domains (REXdb) that reflects this classification provides a reference for efficient and unified annotation of LTR-retrotransposons in plant genomes. Access to REXdb related tools is implemented in the RepeatExplorer web server (https://repeatexplorer-elixir.cerit-sc.cz/) or using a standalone version of REXdb that can be downloaded seaparately from RepeatExplorer web page (http://repeatexplorer.org/).
- Klíčová slova
- LTR-retrotransposons, Polyprotein domains, Primer binding site, RepeatExplorer, Transposable elements,
- Publikační typ
- časopisecké články MeSH
The characterization of unusual telomere sequence sheds light on patterns of telomere evolution, maintenance and function. Plant species from the closely related genera Cestrum, Vestia and Sessea (family Solanaceae) lack known plant telomeric sequences. Here we characterize the telomere of Cestrum elegans, work that was a challenge because of its large genome size and few chromosomes (1C 9.76 pg; n = 8). We developed an approach that combines BAL31 digestion, which digests DNA from the ends and chromosome breaks, with next-generation sequencing (NGS), to generate data analysed in RepeatExplorer, designed for de novo repeats identification and quantification. We identify an unique repeat motif (TTTTTTAGGG)n in C. elegans, occurring in ca. 30 400 copies per haploid genome, averaging ca. 1900 copies per telomere, and synthesized by telomerase. We demonstrate that the motif is synthesized by telomerase. The occurrence of an unusual eukaryote (TTTTTTAGGG)n telomeric motif in C. elegans represents a switch in motif from the 'typical' angiosperm telomere (TTTAGGG)n . That switch may have happened with the divergence of Cestrum, Sessea and Vestia. The shift in motif when it arose would have had profound effects on telomere activity. Thus our finding provides a unique handle to study how telomerase and telomeres responded to genetic change, studies that will shed more light on telomere function.
- Klíčová slova
- Cestrum elegans, GenBank KM573817-573822, NGS analysis, RepeatExplorer, telomerase, telomeric sequence, unusual telomere,
- MeSH
- Cestrum genetika MeSH
- chromozomy rostlin genetika MeSH
- telomery chemie genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
The repetitive content of the plant genome (repeatome) often represents its largest fraction and is frequently correlated with its size. Transposable elements (TEs), the main component of the repeatome, are an important driver in the genome diversification due to their fast-evolving nature. Hybridization and polyploidization events are hypothesized to induce massive bursts of TEs resulting, among other effects, in an increase of copy number and genome size. Little is known about the repeatome dynamics following hybridization and polyploidization in plants that reproduce by apomixis (asexual reproduction via seeds). To address this, we analyzed the repeatomes of two diploid parental species, Hieracium intybaceum and H. prenanthoides (sexual), their diploid F1 synthetic and their natural triploid hybrids (H. pallidiflorum and H. picroides, apomictic). Using low-coverage next-generation sequencing (NGS) and a graph-based clustering approach, we detected high overall similarity across all major repeatome categories between the parental species, despite their large phylogenetic distance. Medium and highly abundant repetitive elements comprise ∼70% of Hieracium genomes; most prevalent were Ty3/Gypsy chromovirus Tekay and Ty1/Copia Maximus-SIRE elements. No TE bursts were detected, neither in synthetic nor in natural hybrids, as TE abundance generally followed theoretical expectations based on parental genome dosage. Slight over- and under-representation of TE cluster abundances reflected individual differences in genome size. However, in comparative analyses, apomicts displayed an overabundance of pararetrovirus clusters not observed in synthetic hybrids. Substantial deviations were detected in rDNAs and satellite repeats, but these patterns were sample specific. rDNA and satellite repeats (three of them were newly developed as cytogenetic markers) were localized on chromosomes by fluorescence in situ hybridization (FISH). In a few cases, low-abundant repeats (5S rDNA and certain satellites) showed some discrepancy between NGS data and FISH results, which is due partly to the bias of low-coverage sequencing and partly to low amounts of the satellite repeats or their sequence divergence. Overall, satellite DNA (including rDNA) was markedly affected by hybridization, but independent of the ploidy or reproductive mode of the progeny, whereas bursts of TEs did not play an important role in the evolutionary history of Hieracium.
- Klíčová slova
- RepeatExplorer, apomixis, hawkweed, hybridization, next-generation sequencing, polyploidization, repeatome,
- Publikační typ
- časopisecké články MeSH
Satellite DNA (satDNA) is a rapidly evolving class of tandem repeats, with some monomers being involved in centromere organization and function. To identify repeats associated with (peri)centromeric regions, we investigated satDNA across Southern and Coastal clades of African annual killifishes of the genus Nothobranchius. Molecular cytogenetic and bioinformatic analyses revealed that two previously identified satellites, designated here as NkadSat01-77 and NfurSat01-348, are associated with (peri)centromeres only in one lineage of the Southern clade. NfurSat01-348 was, however, additionally detected outside centromeres in three members of the Coastal clade. We also identified a novel satDNA, NrubSat01-48, associated with (peri)centromeres in N. foerschi, N. guentheri, and N. rubripinnis. Our findings revealed fast turnover of satDNA associated with (peri)centromeres and different trends in their evolution in two clades of the genus Nothobranchius.
- Klíčová slova
- Centromere drive, Constitutive heterochromatin, RepeatExplorer, Repetitive sequences, satDNA,
- MeSH
- centromera genetika MeSH
- Cyprinodontidae * genetika MeSH
- Fundulidae * genetika MeSH
- molekulární evoluce MeSH
- satelitní DNA MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- satelitní DNA MeSH
Satellite DNA (satDNA) consists of sequences of DNA that form tandem repetitions across the genome, and it is notorious for its diversity and fast evolutionary rate. Despite its importance, satDNA has been only sporadically studied in reptile lineages. Here, we sequenced genomic DNA and PCR-amplified microdissected W chromosomes on the Illumina platform in order to characterize the monomers of satDNA from the Henkel's leaf-tailed gecko U. henkeli and to compare their topology by in situ hybridization in the karyotypes of the closely related Günther's flat-tail gecko U. guentheri and gold dust day gecko P. laticauda. We identified seventeen different satDNAs; twelve of them seem to accumulate in centromeres, telomeres and/or the W chromosome. Notably, centromeric and telomeric regions seem to share similar types of satDNAs, and we found two that seem to accumulate at both edges of all chromosomes in all three species. We speculate that the long-term stability of all-acrocentric karyotypes in geckos might be explained from the presence of specific satDNAs at the centromeric regions that are strong meiotic drivers, a hypothesis that should be further tested.
- Klíčová slova
- FISH, Gekkonidae, RepeatExplorer, evolution, karyotype, satellite DNA,
- MeSH
- centromera * genetika MeSH
- cytogenetické vyšetření * metody MeSH
- hybridizace in situ fluorescenční MeSH
- ještěři * genetika MeSH
- karyotyp * MeSH
- satelitní DNA * genetika MeSH
- telomery * genetika MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- satelitní DNA * MeSH
Using African annual killifishes of the genus Nothobranchius from temporary savannah pools with rapid karyotype and sex chromosome evolution, we analysed the chromosomal distribution of telomeric (TTAGGG)n repeat and Nfu-SatC satellite DNA (satDNA; isolated from Nothobranchius furzeri) in 15 species across the Nothobranchius killifish phylogeny, and with Fundulosoma thierryi as an out-group. Our fluorescence in situ hybridization experiments revealed that all analysed taxa share the presence of Nfu-SatC repeat but with diverse organization and distribution on chromosomes. Nfu-SatC landscape was similar in conspecific populations of Nothobranchius guentheri and Nothobranchius melanospilus but slightly-to-moderately differed between populations of Nothobranchius pienaari, and between closely related Nothobranchius kuhntae and Nothobranchius orthonotus. Inter-individual variability in Nfu-SatC patterns was found in N. orthonotus and Nothobranchius krysanovi. We revealed mostly no sex-linked patterns of studied repetitive DNA distribution. Only in Nothobranchius brieni, possessing multiple sex chromosomes, Nfu-SatC repeat occupied a substantial portion of the neo-Y chromosome, similarly as formerly found in the XY sex chromosome system of turquoise killifish N. furzeri and its sister species Nothobranchius kadleci-representatives not closely related to N. brieni. All studied species further shared patterns of expected telomeric repeats at the ends of all chromosomes and no additional interstitial telomeric sites. In summary, we revealed (i) the presence of conserved satDNA class in Nothobranchius clades (a rare pattern among ray-finned fishes); (ii) independent trajectories of Nothobranchius sex chromosome differentiation, with recurrent and convergent accumulation of Nfu-SatC on the Y chromosome in some species; and (iii) genus-wide shared tendency to loss of telomeric repeats during interchromosomal rearrangements. Collectively, our findings advance our understanding of genome structure, mechanisms of karyotype reshuffling, and sex chromosome differentiation in Nothobranchius killifishes from the genus-wide perspective.
- Klíčová slova
- FISH, RepeatExplorer, chromosome fusion, chromosome polymorphism, repeatome, sex chromosome,
- MeSH
- Cyprinodontidae MeSH
- Cyprinodontiformes * MeSH
- Fundulus heteroclitus MeSH
- hybridizace in situ fluorescenční MeSH
- karyotyp MeSH
- satelitní DNA * genetika MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- N-sulfo-2-aminotricarballylate MeSH Prohlížeč
- satelitní DNA * MeSH
This article describes a novel method to identify as yet undiscovered telomere sequences, which combines next generation sequencing (NGS) with BAL31 digestion of high molecular weight DNA. The method was applied to two groups of plants: i) dicots, genus Cestrum, and ii) monocots, Allium species (e.g. A. ursinum and A. cepa). Both groups consist of species with large genomes (tens of Gb) and a low number of chromosomes (2n=14-16), full of repeat elements. Both genera lack typical telomeric repeats and multiple studies have attempted to characterize alternative telomeric sequences. However, despite interesting hypotheses and suggestions of alternative candidate telomeres (retrotransposons, rDNA, satellite repeats) these studies have not resolved the question. In a novel approach based on the two most general features of eukaryotic telomeres, their repetitive character and sensitivity to BAL31 nuclease digestion, we have taken advantage of the capacity and current affordability of NGS in combination with the robustness of classical BAL31 nuclease digestion of chromosomal termini. While representative samples of most repeat elements were ensured by low-coverage (less than 5%) genomic shot-gun NGS, candidate telomeres were identified as under-represented sequences in BAL31-treated samples.
- Klíčová slova
- BAL31, NGS, RepeatExplorer, Tandem Repeats Finder, Tandem Repeats Merger, Telomere,
- MeSH
- Allium genetika MeSH
- Cestrum genetika MeSH
- chromozomy rostlin MeSH
- endodeoxyribonukleasy metabolismus MeSH
- genom rostlinný * MeSH
- genomika MeSH
- sekvenční analýza DNA metody MeSH
- telomery genetika MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- endodeoxyribonukleasy MeSH
- exonuclease Bal 31 MeSH Prohlížeč
Homomorphic sex chromosomes and their turnover are common in teleosts. We investigated the evolution of nascent sex chromosomes in several populations of two sister species of African annual killifishes, Nothobranchius furzeri and N. kadleci, focusing on their under-studied repetitive landscape. We combined bioinformatic analyses of the repeatome with molecular cytogenetic techniques, including comparative genomic hybridization, fluorescence in situ hybridization with satellite sequences, ribosomal RNA genes (rDNA) and bacterial artificial chromosomes (BACs), and immunostaining of SYCP3 and MLH1 proteins to mark lateral elements of synaptonemal complexes and recombination sites, respectively. Both species share the same heteromorphic XY sex chromosome system, which thus evolved prior to their divergence. This was corroborated by sequence analysis of a putative master sex determining (MSD) gene gdf6Y in both species. Based on their divergence, differentiation of the XY sex chromosome pair started approximately 2 million years ago. In all populations, the gdf6Y gene mapped within a region rich in satellite DNA on the Y chromosome long arms. Despite their heteromorphism, X and Y chromosomes mostly pair regularly in meiosis, implying synaptic adjustment. In N. kadleci, Y-linked paracentric inversions like those previously reported in N. furzeri were detected. An inversion involving the MSD gene may suppress occasional recombination in the region, which we otherwise evidenced in the N. furzeri population MZCS-121 of the Limpopo clade lacking this inversion. Y chromosome centromeric repeats were reduced compared with the X chromosome and autosomes, which points to a role of relaxed meiotic drive in shaping the Y chromosome repeat landscape. We speculate that the recombination rate between sex chromosomes was reduced due to heterochiasmy. The observed differences between the repeat accumulations on the X and Y chromosomes probably result from high repeat turnover and may not relate closely to the divergence inferred from earlier SNP analyses.
- Klíčová slova
- Inversion, Recombination suppression, RepeatExplorer, Repeatome, Sex chromosome degeneration, Sex chromosome polymorphism,
- MeSH
- Afričané MeSH
- chromozom Y genetika MeSH
- Cyprinodontidae * genetika MeSH
- Fundulidae * genetika MeSH
- hybridizace in situ fluorescenční MeSH
- lidé MeSH
- molekulární evoluce MeSH
- pohlavní chromozomy genetika MeSH
- srovnávací genomová hybridizace MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Knowledge of the fascinating world of DNA repeats is continuously being enriched by newly identified elements and their hypothetical or well-established biological relevance. Genomic approaches can be used for comparative studies of major repeats in any group of genomes, regardless of their size and complexity. Such studies are particularly fruitful in large genomes, and useful mainly in crop plants where they provide a rich source of molecular markers or information on indispensable genomic components (e.g., telomeres, centromeres, or ribosomal RNA genes). Surprisingly, in Allium species, a comprehensive comparative study of repeats is lacking. Here we provide such a study of two economically important species, Allium cepa (onion), and A. sativum (garlic), and their distantly related A. ursinum (wild garlic). We present an overview and classification of major repeats in these species and have paid specific attention to sequence conservation and copy numbers of major representatives in each type of repeat, including retrotransposons, rDNA, or newly identified satellite sequences. Prevailing repeats in all three studied species belonged to Ty3/gypsy elements, however they significantly diverged and we did not detect them in common clusters in comparative analysis. Actually, only a low number of clusters was shared by all three species. Such conserved repeats were for example 5S and 45S rDNA genes and surprisingly a specific and quite rare Ty1/copia lineage. Species-specific long satellites were found mainly in A. cepa and A. sativum. We also show in situ localization of selected repeats that could potentially be applicable as chromosomal markers, e.g., in interspecific breeding.
- Klíčová slova
- Allium, RepeatExplorer, TAREAN, plant genome, rDNA, repeats, retrotransposon, satellite, telomere,
- MeSH
- Allium klasifikace genetika MeSH
- chromozomy rostlin MeSH
- genom rostlinný * MeSH
- genomika * metody MeSH
- hybridizace in situ fluorescenční MeSH
- nukleotidové motivy MeSH
- retroelementy MeSH
- satelitní DNA MeSH
- tandemové repetitivní sekvence MeSH
- telomery MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- retroelementy MeSH
- satelitní DNA MeSH