Teaching transposon classification as a means to crowd source the curation of repeat annotation - a tardigrade perspective
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
PubMed
38711146
PubMed Central
PMC11071193
DOI
10.1186/s13100-024-00319-8
PII: 10.1186/s13100-024-00319-8
Knihovny.cz E-zdroje
- Klíčová slova
- Annotation, Genome assembly, Library, Manual curation, Non-model organism, Transposable elements,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.
Anglia Ruskin University East Rd Cambridge CB1 1PT UK
Berlin Center for Genomics in Biodiversity Research 14195 Berlin Germany
Centogene GmbH Am Strande 7 18055 Rostock Germany
Department of Bioinformatics and Genetics Swedish Natural History Museum Stockholm Sweden
Department of Biological Sciences University of Notre Dame Notre Dame IN 46556 USA
Department of Biotechnology National Institute of Technology Durgapur Durgapur India
Department of Botany Jagannath Univerity Dhaka 1100 Bangladesh
Department of Ecology and Evolution The University of Chicago Chicago IL 60637 USA
Department of Ecology and Genetics Uppsala University Uppsala Sweden
Department of Ecology Faculty of Science Charles University Prague Czech Republic
Department of Systematic and Evolutionary Botany University of Zurich Zurich Switzerland
Eurofins Genomics Europe Pharma and Diagnostics Products and Services Sales GmbH Ebersberg Germany
Evolutionary Biology and Ecology University of Freiburg Freiburg Germany
German Cancer Research Center NGS Core Facility DKFZ ZMBH Alliance 69120 Heidelberg Germany
INBIOS Conservation Genetic Lab University of Liege Liege Belgium
Institute of Botany Czech Academy of Sciences Průhonice Czech Republic
Institute of Evolution and Ecology University of Tuebingen Tuebingen Germany
LOEWE Centre for Translational Biodiversity Genomics Senckenberganlage 25 60325 Frankfurt Germany
Molecular Ecology Group Verbania Italy
Natural History Museum Oslo University Oslo Norway
New York University Abu Dhabi Saadiyat Island United Arab Emirates
Physalia courses 10249 Berlin Germany
Plant Pathology Group Institute of Integrative Biology ETH Zurich Zurich Switzerland
Reed College Portland OR United States of America
Royal Botanic Gardens Kew Richmond Surrey TW9 3AE UK
School of Biological Sciences University of East Anglia Norwich Research Park Norwich NR4 7TU UK
Skolkovo Institute of Science and Technology Moscow Russia
Swiss Ornithological Institute Vogelwarte Sempach CH 6204 Switzerland
The Natural History Museum Cromwell Road London SW6 7SJ UK
Tree of Life Wellcome Sanger Institute Cambridge CB10 1SA UK
University of Arizona Tucson AZ USA
Zell und Molekularbiologie der Pflanzen Technische Universität Dresden Dresden Germany
Zobrazit více v PubMed
Osmanski AB, Paulat NS, Korstian J, Grimshaw JR, Halsey M, Sullivan KAM et al. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science (1979) [Internet]. 2023;380:eabn1430. 10.1126/science.abn1430. PubMed PMC
Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA [Internet]. 2015;6:11. 10.1186/s13100-015-0041-9. PubMed PMC
Wicker T. The repetitive landscape of the chicken genome. Genome Res [Internet]. 2004;15:126–36. http://genome.cshlp.org/content/15/1/126.abstract. PubMed PMC
Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature [Internet]. 2004;432:695–716. 10.1038/nature03154. PubMed
Boman J, Frankl-Vilches C, da Silva dos Santos M, de Oliveira EHC, Gahr M, Suh A. The Genome of Blue-Capped Cordon-Bleu Uncovers Hidden Diversity of LTR Retrotransposons in Zebra Finch. Genes (Basel) [Internet]. 2019;10:301. https://www.mdpi.com/2073-4425/10/4/301. PubMed PMC
Kapusta A, Suh A, Feschotte C. Dynamics of genome size evolution in birds and mammals. Proc Natl Acad Sci U S A [Internet]. 2017;114:E1460–9. http://www.pnas.org/content/114/8/E1460.abstract. PubMed PMC
Sproul J, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM et al. 600 + insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res [Internet]. 2023; http://genome.cshlp.org/content/early/2023/09/22/gr.277387.122.abstract. PubMed PMC
Platt RN, Blanco-Berdugo L, Ray DA. Accurate transposable element annotation is vital when analyzing new genome assemblies. Genome Biol Evol [Internet]. 2016;8:403–10. 10.1093/gbe/evw009. PubMed PMC
Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I et al. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour [Internet]. 2021;21:263–86. 10.1111/1755-0998.13252. PubMed PMC
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences [Internet]. 2020;117:9451–7. 10.1073/pnas.1921046117. PubMed PMC
Zeng L, Kortschak RD, Raison JM, Bertozzi T, Adelson DL. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies. PLoS One [Internet]. 2018;13:e0193588-. 10.1371/journal.pone.0193588. PubMed PMC
Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M et al. Combined Evidence Annotation of Transposable Elements in Genome Sequences. PLoS Comput Biol [Internet]. 2005;1:e22-. 10.1371/journal.pcbi.0010022. PubMed PMC
Goubert C, Craig RJ, Bilat AF, Peona V, Vogan AA, Protasio AV. A beginner’s guide to manual curation of transposable elements. Mob DNA [Internet]. 2022;13:7. 10.1186/s13100-021-00259-7. PubMed PMC
Storer JM, Hubley R, Rosen J, Smit AFA. Curation Guidelines for de novo Generated Transposable Element Families. Curr Protoc [Internet]. 2021;1:e154. 10.1002/cpz1.154. PubMed PMC
Elliott TA, Heitkam T, Hubley R, Quesneville H, Suh A, Wheeler TJ et al. TE Hub: A community-oriented space for sharing and connecting tools, data, resources, and methods for transposable element annotation. Mob DNA [Internet]. 2021;12:16. 10.1186/s13100-021-00244-0. PubMed PMC
Leung W, Shaffer CD, Chen EJ, Quisenberry TJ, Ko K, Braverman JM et al. Retrotransposons Are the Major Contributors to the Expansion of the Drosophila ananassae Muller F Element. G3 Genes|Genomes|Genetics [Internet]. 2017;7:2439–60. 10.1534/g3.117.040907. PubMed PMC
Moya ND, Stevens L, Miller IR, Sokol CE, Galindo JL, Bardas AD et al. Novel and improved Caenorhabditis briggsae gene models generated by community curation. BMC Genomics. 2023;24. https://link.springer.com/article/10.1186/s12864-023-09582-0. PubMed DOI PMC
Chang WH, Mashouri P, Lozano AX, Johnstone B, Husić M, Olry A et al. Phenotate: crowdsourcing phenotype annotations as exercises inundergraduate classes. Genetics in Medicine [Internet]. 2020;22:1391–400. 10.1038/s41436-020-0812-7. PubMed
Zhou N, Siegel ZD, Zarecor S, Lee N, Campbell DA, Andorf CM et al. Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning. PLoS Comput Biol [Internet]. 2018;14:e1006337-. 10.1371/journal.pcbi.1006337. PubMed PMC
Singh M, Bhartiya D, Maini J, Sharma M, Singh AR, Kadarkaraisamy S et al. The Zebrafish GenomeWiki: a crowdsourcing approach to connect the long tail for zebrafish gene annotation. Database [Internet]. 2014;2014:bau011. 10.1093/database/bau011. PubMed PMC
Prost S, Winter S, De Raad J, Coimbra RTF, Wolf M, Nilsson MA et al. Education in the genomics era: Generating high-quality genome assemblies in university courses. Gigascience [Internet]. 2020;9:giaa058. 10.1093/gigascience/giaa058. PubMed PMC
Prost S, Petersen M, Grethlein M, Hahn SJ, Kuschik-Maczollek N, Olesiuk ME et al. Improving the Chromosome-Level Genome Assembly of the Siamese Fighting Fish (Betta splendens) in a University Master’s Course. G3 Genes|Genomes|Genetics [Internet]. 2020;10:2179–83. 10.1534/g3.120.401205. PubMed PMC
Yoshida Y, Koutsovoulos G, Laetsch DR, Stevens L, Kumar S, Horikawa DD et al. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus. Tyler-Smith C, editor. PLoS Biol [Internet]. 2017;15:e2002266. 10.1371/journal.pbio.2002266. PubMed PMC
Møbjerg N, Halberg KA, Jørgensen A, Persson D, Bjørn M, Ramløv H et al. Survival in extreme environments – on the current knowledge of adaptations in tardigrades. Acta Physiologica [Internet]. 2011;202:409–20. 10.1111/j.1748-1716.2011.02252.x. PubMed
Peter D, Bertolani R, Guidetti R. Actual checklist of Tardigrada species. 2019.
Yuan JY, Finney M, Tsung N, Horvitz HR. Tc4, a Caenorhabditis elegans transposable element with an unusual fold-back structure. Proceedings of the National Academy of Sciences. 1991;88:3334–8. PubMed PMC
Giribet G, Edgecombe GD. Current Understanding of Ecdysozoa and its Internal Phylogenetic Relationships. Integr Comp Biol [Internet]. 2017;57:455–66. 10.1093/icb/icx072. PubMed
Peona V, Kutschera VE, Blom MPK, Irestedt M, Suh A. Satellite DNA evolution in Corvoidea inferred from short and long reads. Mol Ecol [Internet]. 2022;0–64. https://onlinelibrary.wiley.com/doi/10.1111/mec.16484. PubMed
Baril T, Galbraith J, Hayward A. Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline. Mol Biol Evol [Internet]. 2024;41:msae068. https://academic.oup.com/mbe/article/41/4/msae068/7635926. PubMed PMC
Panta M, Mishra A, Hoque MT, Atallah J. ClassifyTE: a stacking-based prediction of hierarchical classification of transposable elements. Bioinformatics [Internet]. 2021;37:2529–36. 10.1093/bioinformatics/btab146. PubMed
Orozco-Arias S, Lopez-Murillo LH, Piña JS, Valencia-Castrillon E, Tabares-Soto R, Castillo-Ossa L et al. Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks. PLoS One [Internet]. 2023;18:e0291925-. 10.1371/journal.pone.0291925. PubMed PMC
Bickmann L, Rodriguez M, Jiang X, Makalowski W. TEclass2: Classification of transposable elements using Transformers. bioRxiv [Internet]. 2023;2023.10.13.562246. http://biorxiv.org/content/early/2023/10/16/2023.10.13.562246.abstract.
Orozco-Arias S, Isaza G, Guyot R, Tabares-Soto R. A systematic review of the application of machine learning in the detection and classification of transposable elements. Nakai K, editor. PeerJ [Internet]. 2019;7:e8311. 10.7717/peerj.8311. PubMed PMC
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117:9451–7. doi: 10.1073/pnas.1921046117. PubMed DOI PMC
Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82. doi: 10.1038/nrg2165. PubMed DOI
Flutre T, Duprat E, Feuillet C, Quesneville H. Considering transposable element diversification in De Novo Annotation Approaches. PLoS ONE. 2011;6:e16526. doi: 10.1371/journal.pone.0016526. PubMed DOI PMC
Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0 [Internet]. 2015. http://www.repeatmasker.org.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: Architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. PubMed DOI PMC
Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 2018;20:1160–6. doi: 10.1093/bib/bbx108. PubMed DOI PMC
Suh A, Smeds L, Ellegren H. Abundant recent activity of retrovirus-like retrotransposons within and among flycatcher species implies a rich source of structural variation in songbird genomes. Mol Ecol [Internet]. 2018;27:99–111. 10.1111/mec.14439. PubMed
Kapitonov VV, Jurka J. A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet. 2008;9:411–2. doi: 10.1038/nrg2165-c1. PubMed DOI
Kohany O, Gentles AJ, Hankus L, Jurka J. Annotation, submission and screening of repetitive elements in repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006;7:474. doi: 10.1186/1471-2105-7-474. PubMed DOI PMC
Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007;41:331–68. doi: 10.1146/annurev.genet.40.110405.090448. PubMed DOI PMC
Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, et al. CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res. 2011;39:D225–9. doi: 10.1093/nar/gkq1189. PubMed DOI PMC
Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32:W327–31. doi: 10.1093/nar/gkh454. PubMed DOI PMC
Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020;48:D265–8. doi: 10.1093/nar/gkz991. PubMed DOI PMC
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K et al. BLAST+: Architecture and applications. BMC Bioinformatics [Internet]. 2009;10:421. 10.1186/1471-2105-10-421. PubMed PMC
Katoh K, Rozewicki J, Yamada KD. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform [Internet]. 2018;20:1160–6. 10.1093/bib/bbx108. PubMed PMC
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. doi: 10.1093/molbev/msaa015. PubMed DOI PMC
Hoang DT, Chernomor O, Von Haeseler A, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2018;35:518–22. doi: 10.1093/molbev/msx281. PubMed DOI PMC
Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–17. doi: 10.1006/jmbi.2000.4042. PubMed DOI
Flutre T, Duprat E, Feuillet C, Quesneville H. Considering Transposable Element Diversification in De Novo Annotation Approaches. PLoS One [Internet]. 2011;6:e16526. 10.1371/journal.pone.0016526. PubMed PMC