rboAnalyzer: A Software to Improve Characterization of Non-coding RNAs From Sequence Database Search Output
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
32849767
PubMed Central
PMC7401326
DOI
10.3389/fgene.2020.00675
Knihovny.cz E-zdroje
- Klíčová slova
- RNA, RNA homology, database, search, secondary structure, sequence,
- Publikační typ
- časopisecké články MeSH
Searching for similar sequences in a database via BLAST or a similar tool is one of the most common bioinformatics tasks applied in general, and to non-coding RNAs in particular. However, the results of the search might be difficult to interpret due to the presence of partial matches to the database subject sequences. Here, we present rboAnalyzer - a tool that helps with interpreting sequence search result by (1) extending partial matches into plausible full-length subject sequences, (2) predicting homology of RNAs represented by full-length subject sequences to the query RNA, (3) pooling information across homologous RNAs found in the search results and public databases such as Rfam to predict more reliable secondary structures for all matches, and (4) contextualizing the matches by providing the prediction results and other relevant information in a rich graphical output. Using predicted full-length matches improves secondary structure prediction and makes rboAnalyzer robust with regards to identification of homology. The output of the tool should help the user to reliably characterize non-coding RNAs in BLAST output. The usefulness of the rboAnalyzer and its ability to correctly extend partial matches to full-length is demonstrated on known homologous RNAs. To allow the user to use custom databases and search options, rboAnalyzer accepts any search results as a text file in the BLAST format. The main output is an interactive HTML page displaying the computed characteristics and other context of the matches. The output can also be exported in an appropriate sequence and/or secondary structure formats.
Zobrazit více v PubMed
Bernhart S. H., Hofacker I. L., Will S., Gruber A. R., Stadler P. F. (2008). RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics 9:474. 10.1186/1471-2105-9-474 PubMed DOI PMC
Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., et al. (2009). BLAST plus: architecture and applications. BMC Bioinformatics 10:421. 10.1186/1471-2105-10-421 PubMed DOI PMC
Cock P. J. A., Antao T., Chang J. T., Chapman B. A., Cox C. J., Dalke A., et al. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25 1422–1423. 10.1093/bioinformatics/btp163 PubMed DOI PMC
Edgar R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32 1792–1797. 10.1093/nar/gkh340 PubMed DOI PMC
Hamada M., Sato K., Kiryu H., Mituyama T., Asai K. (2009). Predictions of RNA secondary structure by combining homologous sequence information. Bioinformatics 25 i330–i338. 10.1093/bioinformatics/btp228 PubMed DOI PMC
Hunter J. D. (2007). Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9 90–95. 10.1109/MCSE.2007.55 DOI
Jelínek J., Hoksza D., Hajiè J., Pešek J., Drozen J., Hladík T., et al. (2019). rPredictorDB: a predictive database of individual secondary structures of RNAs and their formatted plots. Database 2019:baz047. 10.1093/database/baz047 PubMed DOI PMC
Klein R. J., Eddy S. R. (2003). RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics 4:44. 10.1186/1471-2105-4-44 PubMed DOI PMC
Lorenz R., Hofacker I. L., Stadler P. F. (2016). RNA folding with hard and soft constraints. Algorithms Mol. Biol. 11:8. 10.1186/s13015-016-0070-z PubMed DOI PMC
Markham N. R., Zuker M. (2008). UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol. 453, 3–31. PubMed
McKinney W. (2010). “Data structures for statistical computing in python,” in Proceedings of the 9th Python in Science Conference, eds van der Walt S., Millman J., (Austin, TX: ), 51–56.
Nawrocki E. P., Burge S. W., Bateman A., Daub J., Eberhardt R. Y., Eddy S. R., et al. (2015). Rfam 12.0: updates to the RNA families database. Nucleic Acids Res. 43 D130–D137. 10.1093/nar/gku1063 PubMed DOI PMC
Nawrocki E. P., Eddy S. R. (2013). Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29 2933–2935. 10.1093/bioinformatics/btt509 PubMed DOI PMC
Oliphant T. E. (2006). A guide to NumPy. Ho Chi Minh City: Trelgol Publishing USA.
Pánek J., Krásnı L., Bobek J., Ježková E., Korelusová J., Vohradskı J. (2010). The suboptimal structures find the optimal RNAs: homology search for bacterial non-coding RNAs using suboptimal RNA structures. Nucleic Acids Res. 39 3418–3426. 10.1093/nar/gkq1186 PubMed DOI PMC
Puton T., Kozlowski L. P., Rother K. M., Bujnicki J. M. (2013). CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction. Nucleic Acids Res. 41 4307–4323. 10.1093/nar/gkt101 PubMed DOI PMC
Sievers F., Wilm A., Dineen D., Gibson T. J., Karplus K., Li W., et al. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7:539. 10.1038/msb.2011.75 PubMed DOI PMC
Tafer H., Höner zu Siederdissen C., Stadler P. F., Bernhart S. H., Hofacker I. L., Lorenz R., et al. (2011). ViennaRNA Package 2.0. Algorithms Mol. Biol. 6:26. 10.1186/1748-7188-6-26 PubMed DOI PMC
Tan Z., Fu Y., Sharma G., Mathews D. H. (2017). TurboFold II: RNA structural alignment and secondary structure prediction informed by multiple homologs. Nucleic Acids Res. 45 11570–11581. 10.1093/nar/gkx815 PubMed DOI PMC
Will S., Joshi T., Hofacker I. L., Stadler P. F., Backofen R. (2012). LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. RNA 18 900–914. 10.1261/rna.029041.111 PubMed DOI PMC