Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2
Jazyk angličtina Země Velká Británie, Anglie Médium print-electronic
Typ dokumentu časopisecké články, práce podpořená grantem
Grantová podpora
LM2015047
Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports) - International
CZ.02.1.01/0.0/0.0/16_013/0001777
Ministerstvo Školství, Mládeže a Tělovýchovy (Ministry of Education, Youth and Sports) - International
PubMed
33097925
DOI
10.1038/s41596-020-0400-y
PII: 10.1038/s41596-020-0400-y
Knihovny.cz E-zdroje
- MeSH
- DNA sondy chemie genetika MeSH
- DNA chemie genetika MeSH
- genomika metody MeSH
- lidé MeSH
- repetitivní sekvence nukleových kyselin MeSH
- sekvenční analýza DNA metody MeSH
- shluková analýza MeSH
- software MeSH
- transpozibilní elementy DNA MeSH
- vysoce účinné nukleotidové sekvenování metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- DNA sondy MeSH
- DNA MeSH
- transpozibilní elementy DNA MeSH
RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz .
Zobrazit více v PubMed
Pellicer, J., Hidalgo, O., Dodsworth, S. & Leitch, I. J. Genome size diversity and its impact on the evolution of land plants. Genes (Basel) 9, 88 (2018). DOI
Vu, G. T. H. et al. Comparative genome analysis reveals divergent genome size evolution in a carnivorous plant genus. Plant Genome 8, 1–14 (2015). DOI
Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009). DOI
Garrido-Ramos, M. A. Satellite DNA: an evolving topic. Genes (Basel) 8, 230 (2017). DOI
Bennetzen, J. L. & Wang, H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu. Rev. Plant Biol. 65, 505–530 (2014). DOI
Metzker, M. L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2009). DOI
Goerner-Potvin, P. & Bourque, G. Computational tools to unmask transposable elements. Nat. Rev. Genet. 19, 688–704 (2018). DOI
Lower, S. S., McGurk, M. P., Clark, A. G. & Barbash, D. A. Satellite DNA evolution: old ideas, new approaches. Curr. Opin. Genet. Dev. 49, 70–78 (2018). DOI
Novák, P., Neumann, P. & Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinforma. 11, 378 (2010). DOI
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013). DOI
Weiss-Schneeweiss, H., Leitch, A. R., McCann, J., Jang, T.-S. & Macas, J. Employing next generation sequencing to explore the repeat landscape of the plant genome. In Next Generation Sequencing in Plant Systematics Vol. 158 (eds. Hörandl, E. & Appelhans, M.) 155–179 (Koeltz Scientific Books, 2015).
Macas, J., Neumann, P. & Navrátilová, A. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics 8, 427 (2007). DOI
Pertea, G. et al. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19, 651–652 (2003). DOI
Afgan, E. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46, W537–W544 (2018). DOI
Neumann, P., Novák, P., Hoštáková, N. & Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob. DNA 10, 1 (2019). DOI
Novák, P. et al. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res 45, e111 (2017). DOI
Blondel, V. D., Guillaume, J. L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008). DOI
Macas, J. et al. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS ONE 10, e0143424 (2015). DOI
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997). DOI
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2014). DOI
Zytnicki, M., Akhunov, E. & Quesneville, H. Tedna: a transposable element de novo assembler. Bioinformatics 30, 2656–2658 (2014). DOI
Goubert, C. et al. De novo assembly and annotation of the Asian tiger mosquito (Aedes albopictus) repeatome with dnaPipeTE from raw genomic reads and comparative analysis with the yellow fever mosquito (Aedes aegypti). Genome Biol. Evol. 7, 1192–1205 (2015). DOI
Koch, P., Platzer, M. & Downie, B. R. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res. 42, e80 (2014). DOI
Chu, C., Nielsen, R. & Wu, Y. REPdenovo: inferring de novo repeat motifs from short sequence reads. PLoS ONE 11, e0150719 (2016). DOI
Kumke, K. et al. Plantago lagopus B chromosome is enriched in 5S rDNA-derived satellite DNA. Cytogenet. Genome Res. 148, 68–73 (2016). DOI
Grant, J. R., Pilotte, N. & Williams, S. A. A case for using genomics and a bioinformatics pipeline to develop sensitive and species-specific PCR-based diagnostics for soil-transmitted helminths. Front. Genet. 10, 883 (2019). DOI
Neumann, P. et al. Stretching the rules: monocentric chromosomes with multiple centromere domains. PLoS Genet 8, e1002777 (2012). DOI
Howley, P. M., Israel, M. A., Law, M. F. & Martin, M. A. A rapid method for detecting and mapping homology between heterologous DNAs. Evaluation of polyomavirus genomes. J. Biol. Chem. 254, 4876–4883 (1979). PubMed
Ávila Robledillo, L. et al. Extraordinary sequence diversity and promiscuity of centromeric satellites in the legume tribe Fabeae. Mol. Biol. Evol. 37, 2341–2356 (2020). DOI
Ávila Robledillo, L. et al. Satellite DNA in Vicia faba is characterized by remarkable diversity in its sequence composition, association with centromeres, and replication timing. Sci. Rep. 8, 5838 (2018). DOI
Contrasting distributions and expression characteristics of transcribing repeats in Setaria viridis
A chromosome-scale reference genome of grasspea (Lathyrus sativus)
Fast satellite DNA evolution in Nothobranchius annual killifishes
Analysis of 5S rDNA Genomic Organization Through the RepeatExplorer2 Pipeline: A Simplified Protocol
Holocentromeres can consist of merely a few megabase-sized satellite arrays
Disruption of the standard kinetochore in holocentric Cuscuta species
Telomerase RNA in Hymenoptera (Insecta) switched to plant/ciliate-like biogenesis
The ecology of palm genomes: repeat-associated genome size expansion is constrained by aridity
Genome diploidization associates with cladogenesis, trait disparity, and plastid gene evolution