Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
101086768
HORIZON-WIDERA-2022 grant BioGeMT
CZ.02.2.69/0.0/0.0/18 053/0016952
"Postdoc2@ MUNI" by Operační program Výzkum, vývoj a vzdělávání (OPVVV)
PubMed
37886986
PubMed Central
PMC10604046
DOI
10.3390/biology12101276
PII: biology12101276
Knihovny.cz E-zdroje
- Klíčová slova
- CLIP-seq, RNA-binding protein, deep learning, interpretation, transfer learning,
- Publikační typ
- časopisecké články MeSH
RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein-RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context.
Central European Institute of Technology Masaryk University 625 00 Brno Czech Republic
Centre for Molecular Medicine and Biobanking University of Malta MSD 2080 Msida Malta
Department of Molecular Sociology Max Planck Institute of Biophysics 60439 Frankfurt am Main Germany
Zobrazit více v PubMed
Gerstberger S., Hafner M., Tuschl T. A Census of Human RNA-Binding Proteins. Nat. Rev. Genet. 2014;15:829–845. doi: 10.1038/nrg3813. PubMed DOI PMC
Gebauer F., Schwarzl T., Valcárcel J., Hentze M.W. RNA-Binding Proteins in Human Genetic Disease. Nat. Rev. Genet. 2021;22:185–198. doi: 10.1038/s41576-020-00302-y. PubMed DOI
De Bruin R.G., Rabelink T.J., Van Zonneveld A.J., Van Der Veer E.P. Emerging Roles for RNA-Binding Proteins as Effectors and Regulators of Cardiovascular Disease. Eur. Heart J. 2017;38:1380–1388. doi: 10.1093/eurheartj/ehw567. PubMed DOI
Corley M., Burns M.C., Yeo G.W. How RNA-Binding Proteins Interact with RNA: Molecules and Mechanisms. Mol. Cell. 2020;78:9–29. doi: 10.1016/j.molcel.2020.03.011. PubMed DOI PMC
Yan S., Zhao D., Wang C., Wang H., Guan X., Gao Y., Zhang X., Zhang N., Chen R. Characterization of RNA-Binding Proteins in the Cell Nucleus and Cytoplasm. Anal. Chim. Acta. 2021;1168:338609. doi: 10.1016/j.aca.2021.338609. PubMed DOI
Van Nostrand E.L., Freese P., Pratt G.A., Wang X., Wei X., Xiao R., Blue S.M., Chen J.-Y., Cody N.A.L., Dominguez D., et al. A Large-Scale Binding and Functional Map of Human RNA-Binding Proteins. Nature. 2020;583:711–719. doi: 10.1038/s41586-020-2077-3. PubMed DOI PMC
Licatalosi D.D., Mele A., Fak J.J., Ule J., Kayikci M., Chi S.W., Clark T.A., Schweitzer A.C., Blume J.E., Wang X., et al. HITS-CLIP Yields Genome-Wide Insights into Brain Alternative RNA Processing. Nature. 2008;456:464–469. doi: 10.1038/nature07488. PubMed DOI PMC
Ramanathan M., Porter D.F., Khavari P.A. Methods to Study RNA–Protein Interactions. Nat. Methods. 2019;16:225–234. doi: 10.1038/s41592-019-0330-1. PubMed DOI PMC
Ule J., Jensen K.B., Ruggiu M., Mele A., Ule A., Darnell R.B. CLIP Identifies Nova-Regulated RNA Networks in the Brain. Science. 2003;302:1212–1215. doi: 10.1126/science.1090095. PubMed DOI
Singh G., Ricci E.P., Moore M.J. RIPiT-Seq: A High-Throughput Approach for Footprinting RNA:Protein Complexes. Methods. 2014;65:320–332. doi: 10.1016/j.ymeth.2013.09.013. PubMed DOI PMC
Uhl M., Houwaart T., Corrado G., Wright P.R., Backofen R. Computational Analysis of CLIP-Seq Data. Methods. 2017;118–119:60–72. doi: 10.1016/j.ymeth.2017.02.006. PubMed DOI
Kazan H., Ray D., Chan E.T., Hughes T.R., Morris Q. RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins. PLoS Comput. Biol. 2010;6:e1000832. doi: 10.1371/journal.pcbi.1000832. PubMed DOI PMC
Gupta S.K., Kosti I., Plaut G., Pivko A., Tkacz I.D., Cohen-Chalamish S., Biswas D.K., Wachtel C., Waldman Ben-Asher H., Carmi S., et al. The HnRNP F/H Homologue of Trypanosoma Brucei Is Differentially Expressed in the Two Life Cycle Stages of the Parasite and Regulates Splicing and MRNA Stability. Nucleic Acids Res. 2013;41:6577–6594. doi: 10.1093/nar/gkt369. PubMed DOI PMC
Sanford J.R., Wang X., Mort M., VanDuyn N., Cooper D.N., Mooney S.D., Edenberg H.J., Liu Y. Splicing Factor SFRS1 Recognizes a Functionally Diverse Landscape of RNA Transcripts. Genome Res. 2009;19:381–394. doi: 10.1101/gr.082503.108. PubMed DOI PMC
Livi C.M., Blanzieri E. Protein-Specific Prediction of MRNA Binding Using RNA Sequences, Binding Motifs and Predicted Secondary Structures. BMC Bioinform. 2014;15:123. doi: 10.1186/1471-2105-15-123. PubMed DOI PMC
Choi D., Park B., Chae H., Lee W., Han K. Predicting Protein-Binding Regions in RNA Using Nucleotide Profiles and Compositions. BMC Syst. Biol. 2017;11:16. doi: 10.1186/s12918-017-0386-4. PubMed DOI PMC
Maticzka D., Lange S.J., Costa F., Backofen R. GraphProt: Modeling Binding Preferences of RNA-Binding Proteins. Genome Biol. 2014;15:R17. doi: 10.1186/gb-2014-15-1-r17. PubMed DOI PMC
Stražar M., Žitnik M., Zupan B., Ule J., Curk T. Orthogonal Matrix Factorization Enables Integrative Analysis of Multiple RNA Binding Proteins. Bioinformatics. 2016;32:1527–1535. doi: 10.1093/bioinformatics/btw003. PubMed DOI PMC
Eraslan G., Avsec Ž., Gagneur J., Theis F.J. Deep Learning: New Computational Modelling Techniques for Genomics. Nat. Rev. Genet. 2019;20:389–403. doi: 10.1038/s41576-019-0122-6. PubMed DOI
Alipanahi B., Delong A., Weirauch M.T., Frey B.J. Predicting the Sequence Specificities of DNA- and RNA-Binding Proteins by Deep Learning. Nat. Biotechnol. 2015;33:831–838. doi: 10.1038/nbt.3300. PubMed DOI
Pan X., Rijnbeek P., Yan J., Shen H.-B. Prediction of RNA-Protein Sequence and Structure Binding Preferences Using Deep Convolutional and Recurrent Neural Networks. BMC Genom. 2018;19:511. doi: 10.1186/s12864-018-4889-1. PubMed DOI PMC
Ghanbari M., Ohler U. Deep Neural Networks for Interpreting RNA-Binding Protein Target Preferences. Genome Res. 2020;30:214–226. doi: 10.1101/gr.247494.118. PubMed DOI PMC
Wei J., Chen S., Zong L., Gao X., Li Y. Protein–RNA Interaction Prediction with Deep Learning: Structure Matters. Brief. Bioinform. 2022;23:bbab540. doi: 10.1093/bib/bbab540. PubMed DOI PMC
Dasari C.M., Amilpur S., Bhukya R. Exploring Variable-Length Features (Motifs) for Predicting Binding Sites through Interpretable Deep Neural Networks. Eng. Appl. Artif. Intell. 2021;106:104485. doi: 10.1016/j.engappai.2021.104485. DOI
Yang Y., Hou Z., Ma Z., Li X., Wong K.-C. ICircRBP-DHN: Identification of CircRNA-RBP Interaction Sites Using Deep Hierarchical Network. Brief. Bioinform. 2021;22:bbaa274. doi: 10.1093/bib/bbaa274. PubMed DOI
Yosinski J., Clune J., Bengio Y., Lipson H. How Transferable Are Features in Deep Neural Networks? arXiv. 2014 doi: 10.48550/ARXIV.1411.1792.1411.1792 DOI
Avsec Ž., Kreuzhuber R., Israeli J., Xu N., Cheng J., Shrikumar A., Banerjee A., Kim D.S., Beier T., Urban L., et al. The Kipoi Repository Accelerates Community Exchange and Reuse of Predictive Models for Genomics. Nat. Biotechnol. 2019;37:592–600. doi: 10.1038/s41587-019-0140-0. PubMed DOI PMC
Schwessinger R., Gosden M., Downes D., Brown R.C., Oudelaar A.M., Telenius J., Teh Y.W., Lunter G., Hughes J.R. DeepC: Predicting 3D Genome Folding Using Megabase-Scale Transfer Learning. Nat. Methods. 2020;17:1118–1124. doi: 10.1038/s41592-020-0960-3. PubMed DOI PMC
Lan G., Zhou J., Xu R., Lu Q., Wang H. Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network. Int. J. Mol. Sci. 2019;20:3425. doi: 10.3390/ijms20143425. PubMed DOI PMC
Zheng A., Lamkin M., Zhao H., Wu C., Su H., Gymrek M. Deep Neural Networks Identify Sequence Context Features Predictive of Transcription Factor Binding. Nat. Mach. Intell. 2021;3:172–180. doi: 10.1038/s42256-020-00282-y. PubMed DOI PMC
Chalupová E., Vaculík O., Poláček J., Jozefov F., Majtner T., Alexiou P. ENNGene: An Easy Neural Network Model Building Tool for Genomics. BMC Genom. 2022;23:248. doi: 10.1186/s12864-022-08414-x. PubMed DOI PMC
He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition; Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, NV, USA. 26 June–1 July 2016; pp. 770–778.
Zhang S., Zhou J., Hu H., Gong H., Chen L., Cheng C., Zeng J. A Deep Learning Framework for Modeling Structural Features of RNA-Binding Protein Targets. Nucleic Acids Res. 2016;44:e32. doi: 10.1093/nar/gkv1025. PubMed DOI PMC
Lange S.J., Maticzka D., Möhl M., Gagnon J.N., Brown C.M., Backofen R. Global or Local? Predicting Secondary Structure and Accessibility in MRNAs. Nucleic Acids Res. 2012;40:5215–5226. doi: 10.1093/nar/gks181. PubMed DOI PMC
Pollard K.S., Hubisz M.J., Rosenbloom K.R., Siepel A. Detection of Nonneutral Substitution Rates on Mammalian Phylogenies. Genome Res. 2010;20:110–121. doi: 10.1101/gr.097857.109. PubMed DOI PMC
Siepel A., Bejerano G., Pedersen J.S., Hinrichs A.S., Hou M., Rosenbloom K., Clawson H., Spieth J., Hillier L.W., Richards S., et al. Evolutionarily Conserved Elements in Vertebrate, Insect, Worm, and Yeast Genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. PubMed DOI PMC
Shibata Y., Kida T., Fukamachi S., Takeda M., Shinohara A., Shinohara T., Arikawa S. Speeding Up Pattern Matching by Text Compression. In: Bongiovanni G., Petreschi R., Gambosi G., editors. Algorithms and Complexity. Volume 1767. Springer; Berlin/Heidelberg, Germany: 2000. pp. 306–315. Lecture Notes in Computer Science.
Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog. 2019;1:9.
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv. 2019 doi: 10.48550/ARXIV.1907.11692.1907.11692 DOI
Sennrich R., Haddow B., Birch A. Neural Machine Translation of Rare Words with Subword Units; Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Berlin, Germany. 7–12 August 2016; Berlin, Germany: Association for Computational Linguistics; 2016. pp. 1715–1725.
Gage P. A New Algorithm for Data Compression. C Users J. 1994;12:23–38.
Kudo T., Richardson J. SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing; Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; Brussels, Belgium. 31 October–4 November 2018; Brussels, Belgium: Association for Computational Linguistics; 2018. pp. 66–71.
Hackenberg M., Carpena P., Bernaola-Galván P., Barturen G., Alganza Á.M., Oliver J.L. WordCluster: Detecting Clusters of DNA Words and Genomic Elements. Algorithms Mol. Biol. 2011;6:2. doi: 10.1186/1748-7188-6-2. PubMed DOI PMC
Deng L., Liu Y., Shi Y., Zhang W., Yang C., Liu H. Deep Neural Networks for Inferring Binding Sites of RNA-Binding Proteins by Using Distributed Representations of RNA Primary Sequence and Secondary Structure. BMC Genom. 2020;21:866. doi: 10.1186/s12864-020-07239-w. PubMed DOI PMC
Du B., Liu Z., Luo F. Deep Multi-Scale Attention Network for RNA-Binding Proteins Prediction. Inf. Sci. 2022;582:287–301. doi: 10.1016/j.ins.2021.09.025. DOI
Hassanzadeh H.R., Wang M.D. DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins; Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); Shenzhen, China. 15–18 December 2016; PubMed DOI PMC
Liang T., Jin Y., Li Y., Wang T. EDCNN: Edge Enhancement-Based Densely Connected Network with Compound Loss for Low-Dose CT Denoising; Proceedings of the 2020 15th IEEE International Conference on Signal Processing (ICSP); Beijing, China. 6–9 December 2020; pp. 193–198.
Pan X., Shen H.-B. Learning Distributed Representations of RNA Sequences and Its Application for Predicting RNA-Protein Binding Sites with a Convolutional Neural Network. Neurocomputing. 2018;305:51–58. doi: 10.1016/j.neucom.2018.04.036. DOI
Pan X., Shen H.-B. Predicting RNA–Protein Binding Sites and Motifs through Combining Local and Global Deep Convolutional Neural Networks. Bioinformatics. 2018;34:3427–3436. doi: 10.1093/bioinformatics/bty364. PubMed DOI
Steffen P., Voß B., Rehmsmeier M., Reeder J., Giegerich R. RNAshapes: An Integrated RNA Analysis Package Based on Abstract Shapes. Bioinformatics. 2006;22:500–503. doi: 10.1093/bioinformatics/btk010. PubMed DOI
Gao F.B., Carson C.C., Levine T., Keene J.D. Selection of a Subset of MRNAs from Combinatorial 3’ Untranslated Region Libraries Using Neuronal RNA-Binding Protein Hel-N1. Proc. Natl. Acad. Sci. USA. 1994;91:11207–11211. doi: 10.1073/pnas.91.23.11207. PubMed DOI PMC
Hafner M., Landthaler M., Burger L., Khorshid M., Hausser J., Berninger P., Rothballer A., Ascano M., Jungkamp A.-C., Munschauer M., et al. Transcriptome-Wide Identification of RNABinding Protein and MicroRNA Target Sites by PAR-CLIP. Cell. 2010;141:129–141. doi: 10.1016/j.cell.2010.03.009. PubMed DOI PMC
Perez I., Lin C.H., McAfee J.G., Patton J.G. Mutation of PTB binding sites causes misregulation of alternative 3’ splice site selection in vivo. RNA. 1997;3:764–778. PubMed PMC
Tacke R., Chen Y., Manley J.L. Sequence-Specific RNA Binding by an SR Protein Requires RS Domain Phosphorylation: Creation of an SRp40-Specific Splicing Enhancer. Proc. Natl. Acad. Sci. USA. 1997;94:1148–1153. doi: 10.1073/pnas.94.4.1148. PubMed DOI PMC
Li X., Kazan H., Lipshitz H.D., Morris Q.D. Finding the Target Sites of RNA-Binding Proteins: Finding Target Sites of RNA-Binding Proteins. WIREs RNA. 2014;5:111–130. doi: 10.1002/wrna.1201. PubMed DOI PMC
Zhuang F., Qi Z., Duan K., Xi D., Zhu Y., Zhu H., Xiong H., He Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE. 2021;109:43–76. doi: 10.1109/JPROC.2020.3004555. DOI
Sun L., Xu K., Huang W., Yang Y.T., Li P., Tang L., Xiong T., Zhang Q.C. Predicting Dynamic Cellular Protein–RNA Interactions by Deep Learning Using in Vivo RNA Structures. Cell Res. 2021;31:495–516. doi: 10.1038/s41422-021-00476-y. PubMed DOI PMC
Wu H., Pan X., Yang Y., Shen H.-B. Recognizing Binding Sites of Poorly Characterized RNA-Binding Proteins on Circular RNAs Using Attention Siamese Network. Brief. Bioinform. 2021;22:bbab279. doi: 10.1093/bib/bbab279. PubMed DOI
Zhao S., Hamada M. Multi-ResBind: A Residual Network-Based Multi-Label Classifier for in Vivo RNA Binding Prediction and Preference Visualization. BMC Bioinform. 2021;22:554. doi: 10.1186/s12859-021-04430-y. PubMed DOI PMC
Simone L.E., Keene J.D. Mechanisms Coordinating ELAV/Hu MRNA Regulons. Curr. Opin. Genet. Dev. 2013;23:35–43. doi: 10.1016/j.gde.2012.12.006. PubMed DOI PMC
García-Mauriño S.M., Rivero-Rodríguez F., Velázquez-Cruz A., Hernández-Vellisca M., Díaz-Quintana A., De La Rosa M.A., Díaz-Moreno I. RNA Binding Protein Regulation and Cross-Talk in the Control of AU-Rich MRNA Fate. Front. Mol. Biosci. 2017;4:71. doi: 10.3389/fmolb.2017.00071. PubMed DOI PMC
Wang X., Juan L., Lv J., Wang K., Sanford J.R., Liu Y. Predicting Sequence and Structural Specificities of RNA Binding Regions Recognized by Splicing Factor SRSF1. BMC Genom. 2011;12:S8. doi: 10.1186/1471-2164-12-S5-S8. PubMed DOI PMC
Grønning A.G.B., Doktor T.K., Larsen S.J., Petersen U.S.S., Holm L.L., Bruun G.H., Hansen M.B., Hartung A.-M., Baumbach J., Andresen B.S. DeepCLIP: Predicting the Effect of Mutations on Protein–RNA Binding with Deep Learning. Nucleic Acids Res. 2020;48:7099–7118. doi: 10.1093/nar/gkaa530. PubMed DOI PMC
Ben-Bassat I., Chor B., Orenstein Y. A Deep Neural Network Approach for Learning Intrinsic Protein-RNA Binding Preferences. Bioinformatics. 2018;34:i638–i646. doi: 10.1093/bioinformatics/bty600. PubMed DOI