ENNGene: an Easy Neural Network model building tool for Genomics

. 2022 Mar 31 ; 23 (1) : 248. [epub] 20220331

Jazyk angličtina Země Velká Británie, Anglie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid35361122

Grantová podpora
867414 H2020 Spreading Excellence and Widening Participation
CZ.02.2.69/0.0/0.0/18 053/0016952 Masarykova Univerzita

Odkazy

PubMed 35361122
PubMed Central PMC8973509
DOI 10.1186/s12864-022-08414-x
PII: 10.1186/s12864-022-08414-x
Knihovny.cz E-zdroje

BACKGROUND: The recent big data revolution in Genomics, coupled with the emergence of Deep Learning as a set of powerful machine learning methods, has shifted the standard practices of machine learning for Genomics. Even though Deep Learning methods such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) are becoming widespread in Genomics, developing and training such models is outside the ability of most researchers in the field. RESULTS: Here we present ENNGene-Easy Neural Network model building tool for Genomics. This tool simplifies training of custom CNN or hybrid CNN-RNN models on genomic data via an easy-to-use Graphical User Interface. ENNGene allows multiple input branches, including sequence, evolutionary conservation, and secondary structure, and performs all the necessary preprocessing steps, allowing simple input such as genomic coordinates. The network architecture is selected and fully customized by the user, from the number and types of the layers to each layer's precise set-up. ENNGene then deals with all steps of training and evaluation of the model, exporting valuable metrics such as multi-class ROC and precision-recall curve plots or TensorBoard log files. To facilitate interpretation of the predicted results, we deploy Integrated Gradients, providing the user with a graphical representation of an attribution level of each input position. To showcase the usage of ENNGene, we train multiple models on the RBP24 dataset, quickly reaching the state of the art while improving the performance on more than half of the proteins by including the evolutionary conservation score and tuning the network per protein. CONCLUSIONS: As the role of DL in big data analysis in the near future is indisputable, it is important to make it available for a broader range of researchers. We believe that an easy-to-use tool such as ENNGene can allow Genomics researchers without a background in Computational Sciences to harness the power of DL to gain better insights into and extract important information from the large amounts of data available in the field.

Zobrazit více v PubMed

Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65:386–408. doi: 10.1037/h0042519. PubMed DOI

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. PubMed DOI

Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33:831–838. doi: 10.1038/nbt.3300. PubMed DOI

Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015;12:931–934. doi: 10.1038/nmeth.3547. PubMed DOI PMC

Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016;26:990–999. doi: 10.1101/gr.200535.115. PubMed DOI PMC

Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20:389–403. doi: 10.1038/s41576-019-0122-6. PubMed DOI

Budach S, Marsico A. pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics. 2018;34:3035–3037. doi: 10.1093/bioinformatics/bty222. PubMed DOI PMC

Chen KM, Cofer EM, Zhou J, Troyanskaya OG. Selene: a PyTorch-based deep learning library for sequence data. Nat Methods. 2019;16:315–318. doi: 10.1038/s41592-019-0360-8. PubMed DOI PMC

Kopp W, Monti R, Tamburrini A, Ohler U, Akalin A. Deep learning for genomics using Janggu. Nat Commun. 2020;11:3488. doi: 10.1038/s41467-020-17155-y. PubMed DOI PMC

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Wallach H, Larochelle H, Beygelzimer A, d\textquotesingle Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. Curran Associates, Inc.; 2019. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.

Sundararajan M, Taly A, Yan Q. Axiomatic Attribution for Deep Networks. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. PMLR; 2017. p. 3319–28. http://proceedings.mlr.press/v70/sundararajan17a/sundararajan17a.pdf.

Maticzka D, Lange SJ, Costa F, Backofen R. GraphProt: modeling binding preferences of RNA-binding proteins. Genome Biol. 2014;15:R17. doi: 10.1186/gb-2014-15-1-r17. PubMed DOI PMC

Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I, Talwar K, et al. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. New York: Association for Computing Machinery; 2016. Deep Learning with Differential Privacy; pp. 308–18.

Buber E, Diri B. 2018 6th International Conference on Control Engineering Information Technology (CEIT) 2018. Performance Analysis and CPU vs GPU Comparison for Deep Learning; pp. 1–6.

Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010;20:110–121. doi: 10.1101/gr.097857.109. PubMed DOI PMC

Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, et al. Ensembl 2021. Nucleic Acids Res. 2021;49:D884–D891. doi: 10.1093/nar/gkaa942. PubMed DOI PMC

Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32 Database issue:D493–6. doi: 10.1093/nar/gkh103. PubMed DOI PMC

Lorenz R, Bernhart SH, HönerZuSiederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA Package 20. Algorithms Mol Biol. 2011;6:26. doi: 10.1186/1748-7188-6-26. PubMed DOI PMC

Pan X, Shen H-B. Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network. Neurocomputing. 2018;305:51–58. doi: 10.1016/j.neucom.2018.04.036. DOI

Ben-Bassat I, Chor B, Orenstein Y. A deep neural network approach for learning intrinsic protein-RNA binding preferences. Bioinformatics. 2018;34:i638–i646. doi: 10.1093/bioinformatics/bty600. PubMed DOI

Alsallakh B, Kokhlikyan N, Miglani V, Yuan J, Reblitz-Richardson O. Mind the Pad -- CNNs can Develop Blind Spots. arXiv [cs.CV]. 2020. http://arxiv.org/abs/2010.02178.

Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. Journal of Big Data. 2019;6:1–54. doi: 10.1186/s40537-018-0162-3. DOI

Sutskever I, Martens J, Dahl G, Geoffrey H. On the importance of initialization and momentum in deep learning. In: ICML’13: Proceedings of the 30th International Conference on International Conference on Machine Learning. 2013. p. III – 1139 – III – 1147. http://proceedings.mlr.press/v28/sutskever13.pdf.

Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude. 2012. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf. Accessed 1 Nov 2021.

Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv [cs.LG]. 2014. http://arxiv.org/abs/1412.6980.

Smith LN. Cyclical Learning Rates for Training Neural Networks. 2015. http://arxiv.org/abs/1506.01186. Accessed 1 Nov 2021.

Chung J, Gulcehre C, Cho K, Bengio Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. 2014. http://arxiv.org/abs/1412.3555. Accessed 1 Nov 2021.

Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. PubMed DOI

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–1958.

Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv [cs.LG]. 2015. http://arxiv.org/abs/1502.03167.

Deng L, Liu Y, Shi Y, Zhang W, Yang C, Liu H. Deep neural networks for inferring binding sites of RNA-binding proteins by using distributed representations of RNA primary sequence and secondary structure. BMC Genomics. 2020;21(Suppl 13):866. doi: 10.1186/s12864-020-07239-w. PubMed DOI PMC

Pan X, Fang Y, Li X, Yang Y, Shen H-B. RBPsuite: RNA-protein binding sites prediction suite based on deep learning. BMC Genomics. 2020;21:884. doi: 10.1186/s12864-020-07291-6. PubMed DOI PMC

Pan X, Rijnbeek P, Yan J, Shen H-B. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics. 2018;19. 10.1186/s12864-018-4889-1. PubMed PMC

Zhang K, Pan X, Yang Y, Shen H-B. CRIP: predicting circRNA-RBP-binding sites using a codon-based encoding and hybrid deep neural networks. RNA. 2019;25:1604–1615. doi: 10.1261/rna.070565.119. PubMed DOI PMC

Zhang S, Zhou J, Hu H, Gong H, Chen L, Cheng C, et al. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 2016;44:e32. doi: 10.1093/nar/gkv1025. PubMed DOI PMC

Pan X, Rijnbeek P, Yan J, Shen H-B. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks. BMC Genomics. 2018;19:511. doi: 10.1186/s12864-018-4889-1. PubMed DOI PMC

Du Z, Xiao X, Uversky VN. DeepA-RBPBS: A hybrid convolution and recurrent neural network combined with attention mechanism for predicting RBP binding site. J Biomol Struct Dyn. 2020;1–9. https://pubmed.ncbi.nlm.nih.gov/33272122/. PubMed

Pan X, Shen H-B. RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach. BMC Bioinformatics. 2017;18:136. doi: 10.1186/s12859-017-1561-8. PubMed DOI PMC

Ghanbari M, Ohler U. Deep neural networks for interpreting RNA-binding protein target preferences. Genome Res. 2020;30:214–226. doi: 10.1101/gr.247494.118. PubMed DOI PMC

Park B, Han K. Discovering protein-binding RNA motifs with a generative model of RNA sequences. Comput Biol Chem. 2020;84:107171. doi: 10.1016/j.compbiolchem.2019.107171. PubMed DOI

Grønning AGB, Doktor TK, Larsen SJ, Petersen USS, Holm LL, Bruun GH, et al. DeepCLIP: predicting the effect of mutations on protein–RNA binding with deep learning. Nucleic Acids Res. 2020;48:7099–7118. PubMed PMC

Pan X, Shen H-B. Predicting RNA–protein binding sites and motifs through combining local and global deep convolutional neural networks. Bioinformatics. 2018;34:3427–3436. doi: 10.1093/bioinformatics/bty364. PubMed DOI

Yang H, Deng Z, Pan X, Shen H-B, Choi K-S, Wang L, et al. RNA-binding protein recognition based on multi-view deep feature and multi-label learning. Brief Bioinform. 2021;22. 10.1093/bib/bbaa174. PubMed

Lange SJ, Maticzka D, Möhl M, Gagnon JN, Brown CM, Backofen R. Global or local? Predicting secondary structure and accessibility in mRNAs. Nucleic Acids Res. 2012;40:5215–5226. doi: 10.1093/nar/gks181. PubMed DOI PMC

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. PubMed DOI PMC

Georgakilas GK, Grioni A, Liakos KG, Chalupova E, Plessas FC, Alexiou P. Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci. Sci Rep. 2020;10:9486. doi: 10.1038/s41598-020-66454-3. PubMed DOI PMC

Si J, Cui J, Cheng J, Wu R. computational prediction of RNA-Binding proteins and binding sites. Int J Mol Sci. 2015;16:26303–26317. doi: 10.3390/ijms161125952. PubMed DOI PMC

Nanni L, Ghidoni S, Brahnam S. Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognit. 2017;71:158–72. doi: 10.1016/j.patcog.2017.05.025. DOI

Talukder A, Barham C, Li X, Hu H. Interpretation of deep learning in genomics and epigenomics. Briefings in Bioinformatics. 2021;22. 10.1093/bib/bbaa177. PubMed PMC

Simonyan K, Vedaldi A, Zisserman A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv [cs.CV]. 2013. http://arxiv.org/abs/1312.6034.

Zeiler MD, Fergus R. Visualizing and Understanding Convolutional Networks. In: Computer Vision – ECCV 2014. Springer International Publishing; 2014. p. 818–33. https://link.springer.com/chapter/10.1007/978-3-319-10590-1_53. DOI

Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M. SmoothGrad: removing noise by adding noise. arXiv [cs.LG]. 2017. http://arxiv.org/abs/1706.03825.

Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One. 2015;10:e0130140. doi: 10.1371/journal.pone.0130140. PubMed DOI PMC

Montavon G, Lapuschkin S, Binder A, Samek W, Müller K-R. Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit. 2017;65:211–222. doi: 10.1016/j.patcog.2016.11.008. DOI

Shrikumar A, Greenside P, Shcherbina A, Kundaje A. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences. arXiv [cs.LG]. 2016. http://arxiv.org/abs/1605.01713.

Sundararajan M, Taly A, Yan Q. Axiomatic Attribution for Deep Networks. arXiv [cs.LG]. 2017. http://arxiv.org/abs/1703.01365.

Elsken T, Metzen JH, Hutter F. Neural Architecture Search: A Survey. arXiv [stat.ML]. 2018. http://arxiv.org/abs/1808.05377.

Zoph B, Le QV. Neural Architecture Search with Reinforcement Learning. arXiv [cs.LG]. 2016. http://arxiv.org/abs/1611.01578.

Zoph B, Vasudevan V, Shlens J, Le QV. Learning Transferable Architectures for Scalable Image Recognition. arXiv [cs.CV]. 2017. http://arxiv.org/abs/1707.07012.

Zhang Z, Park CY, Theesfeld CL, Troyanskaya OG. An automated framework for efficiently designing deep convolutional neural networks in genomics. Nature Machine Intelligence. 2021;3:392–400. doi: 10.1038/s42256-021-00316-z. DOI

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes

. 2023 Sep 25 ; 12 (10) : . [epub] 20230925

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...