Teaching transposon classification as a means to crowd source the curation of repeat annotation - a tardigrade perspective

. 2024 May 06 ; 15 (1) : 10. [epub] 20240506

Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid38711146
Odkazy

PubMed 38711146
PubMed Central PMC11071193
DOI 10.1186/s13100-024-00319-8
PII: 10.1186/s13100-024-00319-8
Knihovny.cz E-zdroje

BACKGROUND: The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS: Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS: The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.

Anglia Ruskin University East Rd Cambridge CB1 1PT UK

Berlin Center for Genomics in Biodiversity Research 14195 Berlin Germany

Biological and Environmental Science and Engineering Division King Abdullah University of Science and Technology Thuwal Saudi Arabia

Centogene GmbH Am Strande 7 18055 Rostock Germany

Centre for Molecular Biodiversity Research Leibniz Institute for the Analysis of Biodiversity Change Adenauerallee 127 53113 Bonn Germany

Departamento de Biodiversidad y Biología Evolutiva Museo Nacional de Ciencias Naturales José Gutiérrez Abascal 2 Madrid 28006 Spain

Department of Bioinformatics and Genetics Swedish Natural History Museum Stockholm Sweden

Department of Biological and Environmental Science University of Jyväskylä P O Box 35 Jyväskylä 40014 Finland

Department of Biological Geological and Environmental Science University of Bologna Via Selmi 3 Bologna 40126 Italy

Department of Biological Sciences University of Notre Dame Notre Dame IN 46556 USA

Department of Biotechnology National Institute of Technology Durgapur Durgapur India

Department of Botany Jagannath Univerity Dhaka 1100 Bangladesh

Department of Ecology and Evolution The University of Chicago Chicago IL 60637 USA

Department of Ecology and Evolutionary Biology University of California Los Angeles Los Angeles CA United States of America

Department of Ecology and Genetics Uppsala University Uppsala Sweden

Department of Ecology Faculty of Science Charles University Prague Czech Republic

Department of Genetics Environment and Evolution Centre for Biodiversity and Environment Research University College London London UK

Department of Organismal Biology Systematic Biology Evolutionary Biology Centre Uppsala University Uppsala SE 752 36 Sweden

Department of Systematic and Evolutionary Botany University of Zurich Zurich Switzerland

Eurofins Genomics Europe Pharma and Diagnostics Products and Services Sales GmbH Ebersberg Germany

Evolutionary Biology and Ecology University of Freiburg Freiburg Germany

Evolutionary Genetics Department Leibniz Institute for Zoo and Wildlife Research 10315 Berlin Germany

German Cancer Research Center NGS Core Facility DKFZ ZMBH Alliance 69120 Heidelberg Germany

INBIOS Conservation Genetic Lab University of Liege Liege Belgium

Institute of Botany Czech Academy of Sciences Průhonice Czech Republic

Institute of Evolution and Ecology University of Tuebingen Tuebingen Germany

Institute of Evolutionary Biology Faculty of Biology Biological and Chemical Research Centre University of Warsaw Warsaw Poland

Institute of Genetics and Biotechnology Hungarian University of Agriculture and Life Sciences Budapest Hungary

Institute of Hydrobiology Biology Centre of the Czech Academy of Sciences České Budějovice Czech Republic

LOEWE Centre for Translational Biodiversity Genomics Senckenberganlage 25 60325 Frankfurt Germany

Molecular Ecology Group Verbania Italy

Natural History Museum Oslo University Oslo Norway

New York University Abu Dhabi Saadiyat Island United Arab Emirates

Physalia courses 10249 Berlin Germany

Plant Pathology Group Institute of Integrative Biology ETH Zurich Zurich Switzerland

Present address Centre for Molecular Biodiversity Research Leibniz Institute for the Analysis of Biodiversity Change Adenauerallee 160 53113 Bonn Germany

Reed College Portland OR United States of America

Research Unit Comparative Microbiome Analysis Helmholtz Zentrum München Ingolstädter Landstraße 1 D 85764 Neuherberg Germany

Royal Botanic Gardens Kew Richmond Surrey TW9 3AE UK

School of Biological Sciences University of East Anglia Norwich Research Park Norwich NR4 7TU UK

Skolkovo Institute of Science and Technology Moscow Russia

Swiss Ornithological Institute Vogelwarte Sempach CH 6204 Switzerland

The Natural History Museum Cromwell Road London SW6 7SJ UK

Tree of Life Wellcome Sanger Institute Cambridge CB10 1SA UK

University of Arizona Tucson AZ USA

Zell und Molekularbiologie der Pflanzen Technische Universität Dresden Dresden Germany

Zobrazit více v PubMed

Osmanski AB, Paulat NS, Korstian J, Grimshaw JR, Halsey M, Sullivan KAM et al. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science (1979) [Internet]. 2023;380:eabn1430. 10.1126/science.abn1430. PubMed PMC

Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA [Internet]. 2015;6:11. 10.1186/s13100-015-0041-9. PubMed PMC

Wicker T. The repetitive landscape of the chicken genome. Genome Res [Internet]. 2004;15:126–36. http://genome.cshlp.org/content/15/1/126.abstract. PubMed PMC

Hillier LW, Miller W, Birney E, Warren W, Hardison RC, Ponting CP et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature [Internet]. 2004;432:695–716. 10.1038/nature03154. PubMed

Boman J, Frankl-Vilches C, da Silva dos Santos M, de Oliveira EHC, Gahr M, Suh A. The Genome of Blue-Capped Cordon-Bleu Uncovers Hidden Diversity of LTR Retrotransposons in Zebra Finch. Genes (Basel) [Internet]. 2019;10:301. https://www.mdpi.com/2073-4425/10/4/301. PubMed PMC

Kapusta A, Suh A, Feschotte C. Dynamics of genome size evolution in birds and mammals. Proc Natl Acad Sci U S A [Internet]. 2017;114:E1460–9. http://www.pnas.org/content/114/8/E1460.abstract. PubMed PMC

Sproul J, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM et al. 600 + insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res [Internet]. 2023; http://genome.cshlp.org/content/early/2023/09/22/gr.277387.122.abstract. PubMed PMC

Platt RN, Blanco-Berdugo L, Ray DA. Accurate transposable element annotation is vital when analyzing new genome assemblies. Genome Biol Evol [Internet]. 2016;8:403–10. 10.1093/gbe/evw009. PubMed PMC

Peona V, Blom MPK, Xu L, Burri R, Sullivan S, Bunikis I et al. Identifying the causes and consequences of assembly gaps using a multiplatform genome assembly of a bird-of-paradise. Mol Ecol Resour [Internet]. 2021;21:263–86. 10.1111/1755-0998.13252. PubMed PMC

Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences [Internet]. 2020;117:9451–7. 10.1073/pnas.1921046117. PubMed PMC

Zeng L, Kortschak RD, Raison JM, Bertozzi T, Adelson DL. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies. PLoS One [Internet]. 2018;13:e0193588-. 10.1371/journal.pone.0193588. PubMed PMC

Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M et al. Combined Evidence Annotation of Transposable Elements in Genome Sequences. PLoS Comput Biol [Internet]. 2005;1:e22-. 10.1371/journal.pcbi.0010022. PubMed PMC

Goubert C, Craig RJ, Bilat AF, Peona V, Vogan AA, Protasio AV. A beginner’s guide to manual curation of transposable elements. Mob DNA [Internet]. 2022;13:7. 10.1186/s13100-021-00259-7. PubMed PMC

Storer JM, Hubley R, Rosen J, Smit AFA. Curation Guidelines for de novo Generated Transposable Element Families. Curr Protoc [Internet]. 2021;1:e154. 10.1002/cpz1.154. PubMed PMC

Elliott TA, Heitkam T, Hubley R, Quesneville H, Suh A, Wheeler TJ et al. TE Hub: A community-oriented space for sharing and connecting tools, data, resources, and methods for transposable element annotation. Mob DNA [Internet]. 2021;12:16. 10.1186/s13100-021-00244-0. PubMed PMC

Leung W, Shaffer CD, Chen EJ, Quisenberry TJ, Ko K, Braverman JM et al. Retrotransposons Are the Major Contributors to the Expansion of the Drosophila ananassae Muller F Element. G3 Genes|Genomes|Genetics [Internet]. 2017;7:2439–60. 10.1534/g3.117.040907. PubMed PMC

Moya ND, Stevens L, Miller IR, Sokol CE, Galindo JL, Bardas AD et al. Novel and improved Caenorhabditis briggsae gene models generated by community curation. BMC Genomics. 2023;24. https://link.springer.com/article/10.1186/s12864-023-09582-0. PubMed DOI PMC

Chang WH, Mashouri P, Lozano AX, Johnstone B, Husić M, Olry A et al. Phenotate: crowdsourcing phenotype annotations as exercises inundergraduate classes. Genetics in Medicine [Internet]. 2020;22:1391–400. 10.1038/s41436-020-0812-7. PubMed

Zhou N, Siegel ZD, Zarecor S, Lee N, Campbell DA, Andorf CM et al. Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning. PLoS Comput Biol [Internet]. 2018;14:e1006337-. 10.1371/journal.pcbi.1006337. PubMed PMC

Singh M, Bhartiya D, Maini J, Sharma M, Singh AR, Kadarkaraisamy S et al. The Zebrafish GenomeWiki: a crowdsourcing approach to connect the long tail for zebrafish gene annotation. Database [Internet]. 2014;2014:bau011. 10.1093/database/bau011. PubMed PMC

Prost S, Winter S, De Raad J, Coimbra RTF, Wolf M, Nilsson MA et al. Education in the genomics era: Generating high-quality genome assemblies in university courses. Gigascience [Internet]. 2020;9:giaa058. 10.1093/gigascience/giaa058. PubMed PMC

Prost S, Petersen M, Grethlein M, Hahn SJ, Kuschik-Maczollek N, Olesiuk ME et al. Improving the Chromosome-Level Genome Assembly of the Siamese Fighting Fish (Betta splendens) in a University Master’s Course. G3 Genes|Genomes|Genetics [Internet]. 2020;10:2179–83. 10.1534/g3.120.401205. PubMed PMC

Yoshida Y, Koutsovoulos G, Laetsch DR, Stevens L, Kumar S, Horikawa DD et al. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus. Tyler-Smith C, editor. PLoS Biol [Internet]. 2017;15:e2002266. 10.1371/journal.pbio.2002266. PubMed PMC

Møbjerg N, Halberg KA, Jørgensen A, Persson D, Bjørn M, Ramløv H et al. Survival in extreme environments – on the current knowledge of adaptations in tardigrades. Acta Physiologica [Internet]. 2011;202:409–20. 10.1111/j.1748-1716.2011.02252.x. PubMed

Peter D, Bertolani R, Guidetti R. Actual checklist of Tardigrada species. 2019.

Yuan JY, Finney M, Tsung N, Horvitz HR. Tc4, a Caenorhabditis elegans transposable element with an unusual fold-back structure. Proceedings of the National Academy of Sciences. 1991;88:3334–8. PubMed PMC

Giribet G, Edgecombe GD. Current Understanding of Ecdysozoa and its Internal Phylogenetic Relationships. Integr Comp Biol [Internet]. 2017;57:455–66. 10.1093/icb/icx072. PubMed

Peona V, Kutschera VE, Blom MPK, Irestedt M, Suh A. Satellite DNA evolution in Corvoidea inferred from short and long reads. Mol Ecol [Internet]. 2022;0–64. https://onlinelibrary.wiley.com/doi/10.1111/mec.16484. PubMed

Baril T, Galbraith J, Hayward A. Earl Grey: a fully automated user-friendly transposable element annotation and analysis pipeline. Mol Biol Evol [Internet]. 2024;41:msae068. https://academic.oup.com/mbe/article/41/4/msae068/7635926. PubMed PMC

Panta M, Mishra A, Hoque MT, Atallah J. ClassifyTE: a stacking-based prediction of hierarchical classification of transposable elements. Bioinformatics [Internet]. 2021;37:2529–36. 10.1093/bioinformatics/btab146. PubMed

Orozco-Arias S, Lopez-Murillo LH, Piña JS, Valencia-Castrillon E, Tabares-Soto R, Castillo-Ossa L et al. Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks. PLoS One [Internet]. 2023;18:e0291925-. 10.1371/journal.pone.0291925. PubMed PMC

Bickmann L, Rodriguez M, Jiang X, Makalowski W. TEclass2: Classification of transposable elements using Transformers. bioRxiv [Internet]. 2023;2023.10.13.562246. http://biorxiv.org/content/early/2023/10/16/2023.10.13.562246.abstract.

Orozco-Arias S, Isaza G, Guyot R, Tabares-Soto R. A systematic review of the application of machine learning in the detection and classification of transposable elements. Nakai K, editor. PeerJ [Internet]. 2019;7:e8311. 10.7717/peerj.8311. PubMed PMC

Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 2020;117:9451–7. doi: 10.1073/pnas.1921046117. PubMed DOI PMC

Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–82. doi: 10.1038/nrg2165. PubMed DOI

Flutre T, Duprat E, Feuillet C, Quesneville H. Considering transposable element diversification in De Novo Annotation Approaches. PLoS ONE. 2011;6:e16526. doi: 10.1371/journal.pone.0016526. PubMed DOI PMC

Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0 [Internet]. 2015. http://www.repeatmasker.org.

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: Architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. PubMed DOI PMC

Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 2018;20:1160–6. doi: 10.1093/bib/bbx108. PubMed DOI PMC

Suh A, Smeds L, Ellegren H. Abundant recent activity of retrovirus-like retrotransposons within and among flycatcher species implies a rich source of structural variation in songbird genomes. Mol Ecol [Internet]. 2018;27:99–111. 10.1111/mec.14439. PubMed

Kapitonov VV, Jurka J. A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet. 2008;9:411–2. doi: 10.1038/nrg2165-c1. PubMed DOI

Kohany O, Gentles AJ, Hankus L, Jurka J. Annotation, submission and screening of repetitive elements in repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006;7:474. doi: 10.1186/1471-2105-7-474. PubMed DOI PMC

Feschotte C, Pritham EJ. DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007;41:331–68. doi: 10.1146/annurev.genet.40.110405.090448. PubMed DOI PMC

Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, et al. CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res. 2011;39:D225–9. doi: 10.1093/nar/gkq1189. PubMed DOI PMC

Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32:W327–31. doi: 10.1093/nar/gkh454. PubMed DOI PMC

Lu S, Wang J, Chitsaz F, Derbyshire MK, Geer RC, Gonzales NR, et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 2020;48:D265–8. doi: 10.1093/nar/gkz991. PubMed DOI PMC

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K et al. BLAST+: Architecture and applications. BMC Bioinformatics [Internet]. 2009;10:421. 10.1186/1471-2105-10-421. PubMed PMC

Katoh K, Rozewicki J, Yamada KD. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform [Internet]. 2018;20:1160–6. 10.1093/bib/bbx108. PubMed PMC

Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. doi: 10.1093/molbev/msaa015. PubMed DOI PMC

Hoang DT, Chernomor O, Von Haeseler A, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2018;35:518–22. doi: 10.1093/molbev/msx281. PubMed DOI PMC

Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000;302:205–17. doi: 10.1006/jmbi.2000.4042. PubMed DOI

Flutre T, Duprat E, Feuillet C, Quesneville H. Considering Transposable Element Diversification in De Novo Annotation Approaches. PLoS One [Internet]. 2011;6:e16526. 10.1371/journal.pone.0016526. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...