Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
Language English Country Great Britain, England Media electronic
Document type Journal Article, Research Support, Non-U.S. Gov't
PubMed
32528107
PubMed Central
PMC7289789
DOI
10.1038/s41598-020-66454-3
PII: 10.1038/s41598-020-66454-3
Knihovny.cz E-resources
- MeSH
- Algorithms MeSH
- Genomics methods MeSH
- Humans MeSH
- RNA, Small Nucleolar genetics MeSH
- MicroRNAs genetics MeSH
- Mice MeSH
- RNA, Untranslated genetics MeSH
- Neural Networks, Computer MeSH
- Software MeSH
- Computational Biology methods MeSH
- Animals MeSH
- Check Tag
- Humans MeSH
- Mice MeSH
- Animals MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Names of Substances
- RNA, Small Nucleolar MeSH
- MicroRNAs MeSH
- RNA, Untranslated MeSH
Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.
Central European Institute of Technology Brno Czech Republic
Faculty of Science National Centre for Biomolecular Research Masaryk University Brno Czech Republic
See more in PubMed
Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. PubMed DOI
ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. PubMed DOI PMC
Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341. PubMed DOI
Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. doi: 10.1038/nature11233. PubMed DOI PMC
Taft RJ, et al. Tiny RNAs associated with transcription start sites in animals. Nat. Genet. 2009;41:572–578. doi: 10.1038/ng.312. PubMed DOI
Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. PubMed DOI
Wang Q-H, et al. Systematic analysis of human microRNA divergence based on evolutionary emergence. FEBS Letters. 2011;585:240–248. doi: 10.1016/j.febslet.2010.11.053. PubMed DOI
Saçar Demirci MD, Baumbach J, Allmer J. On the performance of pre-microRNA detection algorithms. Nat. Commun. 2017;8:330. doi: 10.1038/s41467-017-00403-z. PubMed DOI PMC
Lestrade L. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research. 2006;34:D158–D162. doi: 10.1093/nar/gkj002. PubMed DOI PMC
Xie J, et al. Sno/scaRNAbase: a curated database for small nucleolar RNAs and cajal body-specific RNAs. Nucleic Acids Res. 2007;35:D183–7. doi: 10.1093/nar/gkl873. PubMed DOI PMC
Makarova, J. A. & Kramerov, D. A. SNOntology: Myriads of novel snornas or just a mirage? BMC Genomics vol. 12 (2011). PubMed PMC
Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015;33:831–838. doi: 10.1038/nbt.3300. PubMed DOI
Zerbino DR, et al. Ensembl 2018. Nucleic Acids Res. 2018;46:D754–D761. doi: 10.1093/nar/gkx1098. PubMed DOI PMC
Thurmond J, et al. FlyBase 2.0: the next generation. Nucleic Acids Res. 2019;47:D759–D765. doi: 10.1093/nar/gky1003. PubMed DOI PMC
Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 10.1093/nar/gky1141 (2018). PubMed PMC
Karolchik D, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–6. doi: 10.1093/nar/gkh103. PubMed DOI PMC
Roden C, et al. Novel determinants of mammalian primary microRNA processing revealed by systematic evaluation of hairpin-containing transcripts and human genetic variation. Genome Res. 2017;27:374–384. doi: 10.1101/gr.208900.116. PubMed DOI PMC
Henry, V. J., Bandrowski, A. E., Pepin, A.-S., Gonzalez, B. J. & Desfeux, A. OMICtools: an informative directory for multi-omic data analysis. Database2014 (2014). PubMed PMC
Gudyś A, Szcześniak MW, Sikora M, Makałowska I. HuntMi: an efficient and taxon-specific approach in pre-miRNA identification. BMC Bioinformatics. 2013;14:83. doi: 10.1186/1471-2105-14-83. PubMed DOI PMC
Batuwita R, Palade V. microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics. 2009;25:989–995. doi: 10.1093/bioinformatics/btp107. PubMed DOI
Jiang P, et al. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res. 2007;35:W339–44. doi: 10.1093/nar/gkm368. PubMed DOI PMC
Tran VDT, Tempel S, Zerath B, Zehraoui F, Tahi F. miRBoost: boosting support vector machines for microRNA precursor classification. RNA. 2015;21:775–785. doi: 10.1261/rna.043612.113. PubMed DOI PMC
Xue C, et al. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics. 2005;6:310. doi: 10.1186/1471-2105-6-310. PubMed DOI PMC
Hertel J, Hofacker IL, Stadler PF. SnoReport: computational identification of snoRNAs with unknown targets. Bioinformatics. 2008;24:158–164. doi: 10.1093/bioinformatics/btm464. PubMed DOI
Genomic benchmarks: a collection of datasets for genomic sequence classification
miRBind: A Deep Learning Method for miRNA Binding Classification
ENNGene: an Easy Neural Network model building tool for Genomics
PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks