Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
32528107
PubMed Central
PMC7289789
DOI
10.1038/s41598-020-66454-3
PII: 10.1038/s41598-020-66454-3
Knihovny.cz E-zdroje
- MeSH
- algoritmy MeSH
- genomika metody MeSH
- lidé MeSH
- malá jadérková RNA genetika MeSH
- mikro RNA genetika MeSH
- myši MeSH
- nekódující RNA genetika MeSH
- neuronové sítě MeSH
- software MeSH
- výpočetní biologie metody MeSH
- zvířata MeSH
- Check Tag
- lidé MeSH
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- malá jadérková RNA MeSH
- mikro RNA MeSH
- nekódující RNA MeSH
Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.
Central European Institute of Technology Brno Czech Republic
Faculty of Science National Centre for Biomolecular Research Masaryk University Brno Czech Republic
Zobrazit více v PubMed
Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. PubMed DOI
ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. PubMed DOI PMC
Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341. PubMed DOI
Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. doi: 10.1038/nature11233. PubMed DOI PMC
Taft RJ, et al. Tiny RNAs associated with transcription start sites in animals. Nat. Genet. 2009;41:572–578. doi: 10.1038/ng.312. PubMed DOI
Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. PubMed DOI
Wang Q-H, et al. Systematic analysis of human microRNA divergence based on evolutionary emergence. FEBS Letters. 2011;585:240–248. doi: 10.1016/j.febslet.2010.11.053. PubMed DOI
Saçar Demirci MD, Baumbach J, Allmer J. On the performance of pre-microRNA detection algorithms. Nat. Commun. 2017;8:330. doi: 10.1038/s41467-017-00403-z. PubMed DOI PMC
Lestrade L. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research. 2006;34:D158–D162. doi: 10.1093/nar/gkj002. PubMed DOI PMC
Xie J, et al. Sno/scaRNAbase: a curated database for small nucleolar RNAs and cajal body-specific RNAs. Nucleic Acids Res. 2007;35:D183–7. doi: 10.1093/nar/gkl873. PubMed DOI PMC
Makarova, J. A. & Kramerov, D. A. SNOntology: Myriads of novel snornas or just a mirage? BMC Genomics vol. 12 (2011). PubMed PMC
Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015;33:831–838. doi: 10.1038/nbt.3300. PubMed DOI
Zerbino DR, et al. Ensembl 2018. Nucleic Acids Res. 2018;46:D754–D761. doi: 10.1093/nar/gkx1098. PubMed DOI PMC
Thurmond J, et al. FlyBase 2.0: the next generation. Nucleic Acids Res. 2019;47:D759–D765. doi: 10.1093/nar/gky1003. PubMed DOI PMC
Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res. 10.1093/nar/gky1141 (2018). PubMed PMC
Karolchik D, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–6. doi: 10.1093/nar/gkh103. PubMed DOI PMC
Roden C, et al. Novel determinants of mammalian primary microRNA processing revealed by systematic evaluation of hairpin-containing transcripts and human genetic variation. Genome Res. 2017;27:374–384. doi: 10.1101/gr.208900.116. PubMed DOI PMC
Henry, V. J., Bandrowski, A. E., Pepin, A.-S., Gonzalez, B. J. & Desfeux, A. OMICtools: an informative directory for multi-omic data analysis. Database2014 (2014). PubMed PMC
Gudyś A, Szcześniak MW, Sikora M, Makałowska I. HuntMi: an efficient and taxon-specific approach in pre-miRNA identification. BMC Bioinformatics. 2013;14:83. doi: 10.1186/1471-2105-14-83. PubMed DOI PMC
Batuwita R, Palade V. microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics. 2009;25:989–995. doi: 10.1093/bioinformatics/btp107. PubMed DOI
Jiang P, et al. MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res. 2007;35:W339–44. doi: 10.1093/nar/gkm368. PubMed DOI PMC
Tran VDT, Tempel S, Zerath B, Zehraoui F, Tahi F. miRBoost: boosting support vector machines for microRNA precursor classification. RNA. 2015;21:775–785. doi: 10.1261/rna.043612.113. PubMed DOI PMC
Xue C, et al. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics. 2005;6:310. doi: 10.1186/1471-2105-6-310. PubMed DOI PMC
Hertel J, Hofacker IL, Stadler PF. SnoReport: computational identification of snoRNAs with unknown targets. Bioinformatics. 2008;24:158–164. doi: 10.1093/bioinformatics/btm464. PubMed DOI
Genomic benchmarks: a collection of datasets for genomic sequence classification
miRBind: A Deep Learning Method for miRNA Binding Classification
ENNGene: an Easy Neural Network model building tool for Genomics
PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks