GERONIMO: A tool for systematic retrieval of structural RNAs in a broad evolutionary context
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
37848616
PubMed Central
PMC10580375
DOI
10.1093/gigascience/giad080
PII: 7319579
Knihovny.cz E-zdroje
- Klíčová slova
- Snakemake, evolution, high-throughput pipeline, sequence homology searches,
- MeSH
- algoritmy * MeSH
- fylogeneze MeSH
- genomika MeSH
- nekódující RNA genetika chemie MeSH
- RNA * MeSH
- sekvenční seřazení MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- nekódující RNA MeSH
- RNA * MeSH
BACKGROUND: While web-based tools such as BLAST have made identifying conserved gene homologs appear easy, genes with variable sequences pose significant challenges. Functionally important noncoding RNAs (ncRNA) often show low sequence conservation due to genetic variations, including insertions and deletions. Rather than conserved sequences, these RNAs possess highly conserved structural features across a broad phylogenetic range. Such features can be identified using the covariance models approach, which combines sequence alignment with a secondary RNA structure consensus. However, running standard implementation of that approach (Infernal) requires advanced bioinformatics knowledge compared to user-friendly web services like BLAST. The issue is partially addressed by RNAcentral, which can be used to search for homologs across a broad range of ncRNA sequence collections from diverse organisms but not across the genome assemblies. RESULTS: Here, we present GERONIMO, which conducts evolutionary searches across hundreds of genomes in a fully automated way. It provides results extended with taxonomy context, as summary tables and visualizations, to facilitate analysis for user convenience. Additionally, GERONIMO supplements homologous sequences with genomic regions to analyze promoter motifs or gene collinearity, enhancing the validation of results. CONCLUSION: GERONIMO, built using Snakemake, has undergone extensive testing on hundreds of genomes, establishing itself as a valuable tool in the identification of ncRNA homologs across diverse taxonomic groups. Consequently, GERONIMO facilitates the investigation of the evolutionary patterns of functionally significant ncRNA players, whose understanding has previously been limited to individual organisms and close relatives.
Zobrazit více v PubMed
Hopper AK, Phizicky EM. tRNA transfers to the limelight. Genes Dev. 2003;17:162–80.. 10.1101/gad.1049103. PubMed DOI
Sloan KE, Warda AS, Sharma S et al. Tuning the ribosome: the influence of rRNA modification on eukaryotic ribosome biogenesis and function. RNA Biol. 2017;14:1138–52.. 10.1080/15476286.2016.1259781. PubMed DOI PMC
Matera AG, Terns RM, Terns MP. Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs. Nat Rev Mol Cell Biol. 2007;8:209–20.. 10.1038/nrm2124. PubMed DOI
Cech TR, Steitz JA. The noncoding RNA revolution—trashing old rules to forge new ones. Cell. 2014;157:77–94.. 10.1016/j.cell.2014.03.008. PubMed DOI
Decoding noncoding RNA. Nat Methods. 2022;19:1147–8.. 10.1038/s41592-022-01654-5. PubMed DOI
Lee H, Zhang Z, Krause HM. Long noncoding RNAs and repetitive elements: junk or intimate evolutionary partners?. Trends Genet. 2019;35:892–902.. 10.1016/j.tig.2019.09.006. PubMed DOI
Singer MS, Gottschling DE. TLC1: template RNA component of saccharomyces cerevisiae telomerase. Science. 1994;266:404–9.. 10.1126/science.7545955. PubMed DOI
Richards EJ, Ausubel FM. Isolation of a higher eukaryotic telomere from Arabidopsis thaliana. Cell. 1988;53:127–36.. 10.1016/0092-8674(88)90494-1. PubMed DOI
Fajkus P, Peška V, Závodník M, et al. Telomerase RNAs in land plants. Nucleic Acids Res. 2019;47:9842–56.. 10.1093/nar/gkz695. PubMed DOI PMC
McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004;32:W20–5.. 10.1093/nar/gkh435. PubMed DOI PMC
Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–5.. 10.1093/bioinformatics/btt509. PubMed DOI PMC
Barquist L, Burge SW, Gardner PP. Studying RNA homology and conservation with infernal: from single sequences to RNA families. Curr Protoc Bioinformatics. 2016;54:12.13.1–12.13.25.. 10.1002/cpbi.4. PubMed DOI PMC
Kalvari I, Nawrocki EP, Ontiveros-Palacios N et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 2021;49:D192–200.. 10.1093/nar/gkaa1047. PubMed DOI PMC
Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat Rev Genet. 2019;20:631–56.. 10.1038/s41576-019-0150-2. PubMed DOI
Reuter JS, Mathews DH. RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinf. 2010;11:129. 10.1186/1471-2105-11-129. PubMed DOI PMC
Lorenz R, Bernhart SH, Höner Zu Siederdissen C, et al. ViennaRNA package 2.0. Algorithms Mol Biol. 2011;6:26. 10.1186/1748-7188-6-26. PubMed DOI PMC
Bernhart SH, Hofacker IL, Will S et al. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinf. 2008;9:474. 10.1186/1471-2105-9-474. PubMed DOI PMC
Tan Z, Fu Y, Sharma G, et al. TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res. 2017;45:11570–81.. 10.1093/nar/gkx815. PubMed DOI PMC
Zhang J, Fei Y, Sun L et al. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat Methods. 2022;19:1193–207.. 10.1038/s41592-022-01623-y. PubMed DOI
Szikszai M, Wise M, Datta A et al. Deep learning models for RNA secondary structure prediction (probably) do not generalize across families. Bioinformatics. 2022;38:3892–9.. 10.1093/bioinformatics/btac415. PubMed DOI PMC
The RNAcentral Consortium . RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res. 2019;47:D221–9.. 10.1093/nar/gky1034. PubMed DOI PMC
Kitts PA, Church DM, Thibaud-Nissen F et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 2016;44:D73–80.. 10.1093/nar/gkv1226. PubMed DOI PMC
Gibney G, Baxevanis AD. Searching NCBI databases using Entrez. Curr Protoc Bioinformatics. 2011;34:1.3.1–1.3.25.. 10.1002/0471250953.bi0103s34. PubMed DOI
The R Project for Statistical Computing . https://www.r-project.org/index.html.
Wickham H, Averick M, Bryan J et al. Welcome to the Tidyverse. JOSS. 2019;4:1686. 10.21105/joss.01686. DOI
Winter, DJ. rentrez: an R package for the NCBI eUtils API. R J. 2017;9:520. 10.32614/RJ-2017-058. DOI
Camacho C, Coulouris G, Avagyan V et al. BLAST+: architecture and applications. BMC Bioinf. 2009;10:421. 10.1186/1471-2105-10-421. PubMed DOI PMC
Menzel P, Gorodkin J, Stadler PF. The tedious task of finding homologous noncoding RNA genes. RNA, 2009;15:2075–82.. 10.1261/rna.1556009. PubMed DOI PMC
Sweeney BA, Hoksza D, Nawrocki EP, et al. R2DT is a framework for predicting and visualising RNA secondary structure using templates. Nat Commun. 2021;3494;12. 10.1038/s41467-021-23555-5. PubMed DOI PMC
Rivas E. Evolutionary conservation of rna sequence and structure. WIREs RNA. 2021;12:e1649. 10.1002/wrna.1649. PubMed DOI PMC
Gao W, Yang A, Rivas E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life. 2022.75: 471–92.. 10.1002/iub.2694. PubMed DOI PMC
Griffiths-Jones S. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2004;33:D121–4.. 10.1093/nar/gki081. PubMed DOI PMC
Logeswaran D, Li Y, Podlevsky JD, et al. Monophyletic origin and divergent evolution of animal telomerase RNA. Mol Biol Evol. 2021;38:215–28.. 10.1093/molbev/msaa203. PubMed DOI PMC
Bernt M, Donath A, Jühling F et al. MITOS: improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol. 2013;69:313–9.. 10.1016/j.ympev.2012.08.023. PubMed DOI
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–64.. 10.1093/nar/25.5.955. PubMed DOI PMC
Lowe TM, Chan PP. tRNAscan-SE on-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 2016;44:W54–7.. 10.1093/nar/gkw413. PubMed DOI PMC
Kramer ST, Gruenke PR, Alam KK et al. FASTAptameR 2.0: a web tool for combinatorial sequence selections. Mol Ther Nucleic Acids. 2022;29:862–70.. 10.1016/j.omtn.2022.08.030. PubMed DOI PMC
Gao W, Jones TA, Rivas E. Discovery of 17 conserved structural RNAs in fungi. Nucleic Acids Res. 2021;49:6128–43.. 10.1093/nar/gkab355. PubMed DOI PMC
Dobzhansky T. Nothing in biology makes sense except in the light of evolution. Am Biol Teach. 1973;35:125–9.. 10.2307/4444260. DOI
Fajkus P, Kilar A, Nelson ADL et al. Evolution of plant telomerase RNAs: farther to the past, deeper to the roots. Nucleic Acids Res. 2021;49:7680–94.. 10.1093/nar/gkab545. PubMed DOI PMC
Fajkus P, Adámik M, Nelson ADL et al. Telomerase RNA in Hymenoptera (Insecta) switched to plant/ciliate-like biogenesis. Nucleic Acids Res. 2023;51:420–33.. 10.1093/nar/gkac1202. PubMed DOI PMC
Kilar A, Fajkus P, Fajkus J. GERONIMO. WorkflowHub. 2023. 10.48546/workflowhub.workflow.547.1. DOI
Kilar A, Fajkus P, Fajkus J. GERONIMO: gEnomic RNA hOmology aNd evolutIonary MOdeling. Figshare. 2023. 10.6084/m9.figshare.22266430.v2. DOI
Kilar AM, Fajkus P, Fajkus J. Supporting data for “GERONIMO: A Tool for Systematic Retrieval of Structural RNAs in Broad Evolutionary Context.” GigaScience Database. 2023. 10.5524/102438. PubMed DOI PMC