• This record comes from PubMed

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci

. 2020 Jun 11 ; 10 (1) : 9486. [epub] 20200611

Language English Country Great Britain, England Media electronic

Document type Journal Article, Research Support, Non-U.S. Gov't

Links

PubMed 32528107
PubMed Central PMC7289789
DOI 10.1038/s41598-020-66454-3
PII: 10.1038/s41598-020-66454-3
Knihovny.cz E-resources

Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.

See more in PubMed

Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. PubMed DOI

ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. PubMed DOI PMC

Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341. PubMed DOI

Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. doi: 10.1038/nature11233. PubMed DOI PMC

Taft RJ, et al. Tiny RNAs associated with transcription start sites in animals. Nat. Genet. 2009;41:572–578. doi: 10.1038/ng.312. PubMed DOI

Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. PubMed DOI

Wang Q-H, et al. Systematic analysis of human microRNA divergence based on evolutionary emergence. FEBS Letters. 2011;585:240–248. doi: 10.1016/j.febslet.2010.11.053. PubMed DOI

Saçar Demirci MD, Baumbach J, Allmer J. On the performance of pre-microRNA detection algorithms. Nat. Commun. 2017;8:330. doi: 10.1038/s41467-017-00403-z. PubMed DOI PMC

Lestrade L. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research. 2006;34:D158–D162. doi: 10.1093/nar/gkj002. PubMed DOI PMC

Xie J, et al. Sno/scaRNAbase: a curated database for small nucleolar RNAs and cajal body-specific RNAs. Nucleic Acids Res. 2007;35:D183–7. doi: 10.1093/nar/gkl873. PubMed DOI PMC

Makarova, J. A. & Kramerov, D. A. SNOntology: Myriads of novel snornas or just a mirage? BMC Genomics vol. 12 (2011). PubMed PMC

Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015;33:831–838. doi: 10.1038/nbt.3300. PubMed DOI

Zerbino DR, et al. Ensembl 2018. Nucleic Acids Res. 2018;46:D754–D761. doi: 10.1093/nar/gkx1098. PubMed DOI PMC

Thurmond J, et al. FlyBase 2.0: the next generation. Nucleic Acids Res. 2019;47:D759–D765. doi: 10.1093/nar/gky1003. PubMed DOI PMC

Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 10.1093/nar/gky1141 (2018). PubMed PMC

Karolchik D, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–6. doi: 10.1093/nar/gkh103. PubMed DOI PMC

Roden C, et al. Novel determinants of mammalian primary microRNA processing revealed by systematic evaluation of hairpin-containing transcripts and human genetic variation. Genome Res. 2017;27:374–384. doi: 10.1101/gr.208900.116. PubMed DOI PMC

Henry, V. J., Bandrowski, A. E., Pepin, A.-S., Gonzalez, B. J. & Desfeux, A. OMICtools: an informative directory for multi-omic data analysis. Database2014 (2014). PubMed PMC

Gudyś A, Szcześniak MW, Sikora M, Makałowska I. HuntMi: an efficient and taxon-specific approach in pre-miRNA identification. BMC Bioinformatics. 2013;14:83. doi: 10.1186/1471-2105-14-83. PubMed DOI PMC

Batuwita R, Palade V. microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics. 2009;25:989–995. doi: 10.1093/bioinformatics/btp107. PubMed DOI

Jiang P, et al. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res. 2007;35:W339–44. doi: 10.1093/nar/gkm368. PubMed DOI PMC

Tran VDT, Tempel S, Zerath B, Zehraoui F, Tahi F. miRBoost: boosting support vector machines for microRNA precursor classification. RNA. 2015;21:775–785. doi: 10.1261/rna.043612.113. PubMed DOI PMC

Xue C, et al. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics. 2005;6:310. doi: 10.1186/1471-2105-6-310. PubMed DOI PMC

Hertel J, Hofacker IL, Stadler PF. SnoReport: computational identification of snoRNAs with unknown targets. Bioinformatics. 2008;24:158–164. doi: 10.1093/bioinformatics/btm464. PubMed DOI

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...