Transfer Learning Allows Accurate RBP Target Site Prediction with Limited Sample Sizes

. 2023 Sep 25 ; 12 (10) : . [epub] 20230925

Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid37886986

Grantová podpora
101086768 HORIZON-WIDERA-2022 grant BioGeMT
CZ.02.2.69/0.0/0.0/18 053/0016952 "Postdoc2@ MUNI" by Operační program Výzkum, vývoj a vzdělávání (OPVVV)

RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein-RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context.

Zobrazit více v PubMed

Gerstberger S., Hafner M., Tuschl T. A Census of Human RNA-Binding Proteins. Nat. Rev. Genet. 2014;15:829–845. doi: 10.1038/nrg3813. PubMed DOI PMC

Gebauer F., Schwarzl T., Valcárcel J., Hentze M.W. RNA-Binding Proteins in Human Genetic Disease. Nat. Rev. Genet. 2021;22:185–198. doi: 10.1038/s41576-020-00302-y. PubMed DOI

De Bruin R.G., Rabelink T.J., Van Zonneveld A.J., Van Der Veer E.P. Emerging Roles for RNA-Binding Proteins as Effectors and Regulators of Cardiovascular Disease. Eur. Heart J. 2017;38:1380–1388. doi: 10.1093/eurheartj/ehw567. PubMed DOI

Corley M., Burns M.C., Yeo G.W. How RNA-Binding Proteins Interact with RNA: Molecules and Mechanisms. Mol. Cell. 2020;78:9–29. doi: 10.1016/j.molcel.2020.03.011. PubMed DOI PMC

Yan S., Zhao D., Wang C., Wang H., Guan X., Gao Y., Zhang X., Zhang N., Chen R. Characterization of RNA-Binding Proteins in the Cell Nucleus and Cytoplasm. Anal. Chim. Acta. 2021;1168:338609. doi: 10.1016/j.aca.2021.338609. PubMed DOI

Van Nostrand E.L., Freese P., Pratt G.A., Wang X., Wei X., Xiao R., Blue S.M., Chen J.-Y., Cody N.A.L., Dominguez D., et al. A Large-Scale Binding and Functional Map of Human RNA-Binding Proteins. Nature. 2020;583:711–719. doi: 10.1038/s41586-020-2077-3. PubMed DOI PMC

Licatalosi D.D., Mele A., Fak J.J., Ule J., Kayikci M., Chi S.W., Clark T.A., Schweitzer A.C., Blume J.E., Wang X., et al. HITS-CLIP Yields Genome-Wide Insights into Brain Alternative RNA Processing. Nature. 2008;456:464–469. doi: 10.1038/nature07488. PubMed DOI PMC

Ramanathan M., Porter D.F., Khavari P.A. Methods to Study RNA–Protein Interactions. Nat. Methods. 2019;16:225–234. doi: 10.1038/s41592-019-0330-1. PubMed DOI PMC

Ule J., Jensen K.B., Ruggiu M., Mele A., Ule A., Darnell R.B. CLIP Identifies Nova-Regulated RNA Networks in the Brain. Science. 2003;302:1212–1215. doi: 10.1126/science.1090095. PubMed DOI

Singh G., Ricci E.P., Moore M.J. RIPiT-Seq: A High-Throughput Approach for Footprinting RNA:Protein Complexes. Methods. 2014;65:320–332. doi: 10.1016/j.ymeth.2013.09.013. PubMed DOI PMC

Uhl M., Houwaart T., Corrado G., Wright P.R., Backofen R. Computational Analysis of CLIP-Seq Data. Methods. 2017;118–119:60–72. doi: 10.1016/j.ymeth.2017.02.006. PubMed DOI

Kazan H., Ray D., Chan E.T., Hughes T.R., Morris Q. RNAcontext: A New Method for Learning the Sequence and Structure Binding Preferences of RNA-Binding Proteins. PLoS Comput. Biol. 2010;6:e1000832. doi: 10.1371/journal.pcbi.1000832. PubMed DOI PMC

Gupta S.K., Kosti I., Plaut G., Pivko A., Tkacz I.D., Cohen-Chalamish S., Biswas D.K., Wachtel C., Waldman Ben-Asher H., Carmi S., et al. The HnRNP F/H Homologue of Trypanosoma Brucei Is Differentially Expressed in the Two Life Cycle Stages of the Parasite and Regulates Splicing and MRNA Stability. Nucleic Acids Res. 2013;41:6577–6594. doi: 10.1093/nar/gkt369. PubMed DOI PMC

Sanford J.R., Wang X., Mort M., VanDuyn N., Cooper D.N., Mooney S.D., Edenberg H.J., Liu Y. Splicing Factor SFRS1 Recognizes a Functionally Diverse Landscape of RNA Transcripts. Genome Res. 2009;19:381–394. doi: 10.1101/gr.082503.108. PubMed DOI PMC

Livi C.M., Blanzieri E. Protein-Specific Prediction of MRNA Binding Using RNA Sequences, Binding Motifs and Predicted Secondary Structures. BMC Bioinform. 2014;15:123. doi: 10.1186/1471-2105-15-123. PubMed DOI PMC

Choi D., Park B., Chae H., Lee W., Han K. Predicting Protein-Binding Regions in RNA Using Nucleotide Profiles and Compositions. BMC Syst. Biol. 2017;11:16. doi: 10.1186/s12918-017-0386-4. PubMed DOI PMC

Maticzka D., Lange S.J., Costa F., Backofen R. GraphProt: Modeling Binding Preferences of RNA-Binding Proteins. Genome Biol. 2014;15:R17. doi: 10.1186/gb-2014-15-1-r17. PubMed DOI PMC

Stražar M., Žitnik M., Zupan B., Ule J., Curk T. Orthogonal Matrix Factorization Enables Integrative Analysis of Multiple RNA Binding Proteins. Bioinformatics. 2016;32:1527–1535. doi: 10.1093/bioinformatics/btw003. PubMed DOI PMC

Eraslan G., Avsec Ž., Gagneur J., Theis F.J. Deep Learning: New Computational Modelling Techniques for Genomics. Nat. Rev. Genet. 2019;20:389–403. doi: 10.1038/s41576-019-0122-6. PubMed DOI

Alipanahi B., Delong A., Weirauch M.T., Frey B.J. Predicting the Sequence Specificities of DNA- and RNA-Binding Proteins by Deep Learning. Nat. Biotechnol. 2015;33:831–838. doi: 10.1038/nbt.3300. PubMed DOI

Pan X., Rijnbeek P., Yan J., Shen H.-B. Prediction of RNA-Protein Sequence and Structure Binding Preferences Using Deep Convolutional and Recurrent Neural Networks. BMC Genom. 2018;19:511. doi: 10.1186/s12864-018-4889-1. PubMed DOI PMC

Ghanbari M., Ohler U. Deep Neural Networks for Interpreting RNA-Binding Protein Target Preferences. Genome Res. 2020;30:214–226. doi: 10.1101/gr.247494.118. PubMed DOI PMC

Wei J., Chen S., Zong L., Gao X., Li Y. Protein–RNA Interaction Prediction with Deep Learning: Structure Matters. Brief. Bioinform. 2022;23:bbab540. doi: 10.1093/bib/bbab540. PubMed DOI PMC

Dasari C.M., Amilpur S., Bhukya R. Exploring Variable-Length Features (Motifs) for Predicting Binding Sites through Interpretable Deep Neural Networks. Eng. Appl. Artif. Intell. 2021;106:104485. doi: 10.1016/j.engappai.2021.104485. DOI

Yang Y., Hou Z., Ma Z., Li X., Wong K.-C. ICircRBP-DHN: Identification of CircRNA-RBP Interaction Sites Using Deep Hierarchical Network. Brief. Bioinform. 2021;22:bbaa274. doi: 10.1093/bib/bbaa274. PubMed DOI

Yosinski J., Clune J., Bengio Y., Lipson H. How Transferable Are Features in Deep Neural Networks? arXiv. 2014 doi: 10.48550/ARXIV.1411.1792.1411.1792 DOI

Avsec Ž., Kreuzhuber R., Israeli J., Xu N., Cheng J., Shrikumar A., Banerjee A., Kim D.S., Beier T., Urban L., et al. The Kipoi Repository Accelerates Community Exchange and Reuse of Predictive Models for Genomics. Nat. Biotechnol. 2019;37:592–600. doi: 10.1038/s41587-019-0140-0. PubMed DOI PMC

Schwessinger R., Gosden M., Downes D., Brown R.C., Oudelaar A.M., Telenius J., Teh Y.W., Lunter G., Hughes J.R. DeepC: Predicting 3D Genome Folding Using Megabase-Scale Transfer Learning. Nat. Methods. 2020;17:1118–1124. doi: 10.1038/s41592-020-0960-3. PubMed DOI PMC

Lan G., Zhou J., Xu R., Lu Q., Wang H. Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network. Int. J. Mol. Sci. 2019;20:3425. doi: 10.3390/ijms20143425. PubMed DOI PMC

Zheng A., Lamkin M., Zhao H., Wu C., Su H., Gymrek M. Deep Neural Networks Identify Sequence Context Features Predictive of Transcription Factor Binding. Nat. Mach. Intell. 2021;3:172–180. doi: 10.1038/s42256-020-00282-y. PubMed DOI PMC

Chalupová E., Vaculík O., Poláček J., Jozefov F., Majtner T., Alexiou P. ENNGene: An Easy Neural Network Model Building Tool for Genomics. BMC Genom. 2022;23:248. doi: 10.1186/s12864-022-08414-x. PubMed DOI PMC

He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition; Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, NV, USA. 26 June–1 July 2016; pp. 770–778.

Zhang S., Zhou J., Hu H., Gong H., Chen L., Cheng C., Zeng J. A Deep Learning Framework for Modeling Structural Features of RNA-Binding Protein Targets. Nucleic Acids Res. 2016;44:e32. doi: 10.1093/nar/gkv1025. PubMed DOI PMC

Lange S.J., Maticzka D., Möhl M., Gagnon J.N., Brown C.M., Backofen R. Global or Local? Predicting Secondary Structure and Accessibility in MRNAs. Nucleic Acids Res. 2012;40:5215–5226. doi: 10.1093/nar/gks181. PubMed DOI PMC

Pollard K.S., Hubisz M.J., Rosenbloom K.R., Siepel A. Detection of Nonneutral Substitution Rates on Mammalian Phylogenies. Genome Res. 2010;20:110–121. doi: 10.1101/gr.097857.109. PubMed DOI PMC

Siepel A., Bejerano G., Pedersen J.S., Hinrichs A.S., Hou M., Rosenbloom K., Clawson H., Spieth J., Hillier L.W., Richards S., et al. Evolutionarily Conserved Elements in Vertebrate, Insect, Worm, and Yeast Genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. PubMed DOI PMC

Shibata Y., Kida T., Fukamachi S., Takeda M., Shinohara A., Shinohara T., Arikawa S. Speeding Up Pattern Matching by Text Compression. In: Bongiovanni G., Petreschi R., Gambosi G., editors. Algorithms and Complexity. Volume 1767. Springer; Berlin/Heidelberg, Germany: 2000. pp. 306–315. Lecture Notes in Computer Science.

Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog. 2019;1:9.

Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O., Lewis M., Zettlemoyer L., Stoyanov V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv. 2019 doi: 10.48550/ARXIV.1907.11692.1907.11692 DOI

Sennrich R., Haddow B., Birch A. Neural Machine Translation of Rare Words with Subword Units; Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Berlin, Germany. 7–12 August 2016; Berlin, Germany: Association for Computational Linguistics; 2016. pp. 1715–1725.

Gage P. A New Algorithm for Data Compression. C Users J. 1994;12:23–38.

Kudo T., Richardson J. SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing; Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; Brussels, Belgium. 31 October–4 November 2018; Brussels, Belgium: Association for Computational Linguistics; 2018. pp. 66–71.

Hackenberg M., Carpena P., Bernaola-Galván P., Barturen G., Alganza Á.M., Oliver J.L. WordCluster: Detecting Clusters of DNA Words and Genomic Elements. Algorithms Mol. Biol. 2011;6:2. doi: 10.1186/1748-7188-6-2. PubMed DOI PMC

Deng L., Liu Y., Shi Y., Zhang W., Yang C., Liu H. Deep Neural Networks for Inferring Binding Sites of RNA-Binding Proteins by Using Distributed Representations of RNA Primary Sequence and Secondary Structure. BMC Genom. 2020;21:866. doi: 10.1186/s12864-020-07239-w. PubMed DOI PMC

Du B., Liu Z., Luo F. Deep Multi-Scale Attention Network for RNA-Binding Proteins Prediction. Inf. Sci. 2022;582:287–301. doi: 10.1016/j.ins.2021.09.025. DOI

Hassanzadeh H.R., Wang M.D. DeeperBind: Enhancing Prediction of Sequence Specificities of DNA Binding Proteins; Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); Shenzhen, China. 15–18 December 2016; PubMed DOI PMC

Liang T., Jin Y., Li Y., Wang T. EDCNN: Edge Enhancement-Based Densely Connected Network with Compound Loss for Low-Dose CT Denoising; Proceedings of the 2020 15th IEEE International Conference on Signal Processing (ICSP); Beijing, China. 6–9 December 2020; pp. 193–198.

Pan X., Shen H.-B. Learning Distributed Representations of RNA Sequences and Its Application for Predicting RNA-Protein Binding Sites with a Convolutional Neural Network. Neurocomputing. 2018;305:51–58. doi: 10.1016/j.neucom.2018.04.036. DOI

Pan X., Shen H.-B. Predicting RNA–Protein Binding Sites and Motifs through Combining Local and Global Deep Convolutional Neural Networks. Bioinformatics. 2018;34:3427–3436. doi: 10.1093/bioinformatics/bty364. PubMed DOI

Steffen P., Voß B., Rehmsmeier M., Reeder J., Giegerich R. RNAshapes: An Integrated RNA Analysis Package Based on Abstract Shapes. Bioinformatics. 2006;22:500–503. doi: 10.1093/bioinformatics/btk010. PubMed DOI

Gao F.B., Carson C.C., Levine T., Keene J.D. Selection of a Subset of MRNAs from Combinatorial 3’ Untranslated Region Libraries Using Neuronal RNA-Binding Protein Hel-N1. Proc. Natl. Acad. Sci. USA. 1994;91:11207–11211. doi: 10.1073/pnas.91.23.11207. PubMed DOI PMC

Hafner M., Landthaler M., Burger L., Khorshid M., Hausser J., Berninger P., Rothballer A., Ascano M., Jungkamp A.-C., Munschauer M., et al. Transcriptome-Wide Identification of RNABinding Protein and MicroRNA Target Sites by PAR-CLIP. Cell. 2010;141:129–141. doi: 10.1016/j.cell.2010.03.009. PubMed DOI PMC

Perez I., Lin C.H., McAfee J.G., Patton J.G. Mutation of PTB binding sites causes misregulation of alternative 3’ splice site selection in vivo. RNA. 1997;3:764–778. PubMed PMC

Tacke R., Chen Y., Manley J.L. Sequence-Specific RNA Binding by an SR Protein Requires RS Domain Phosphorylation: Creation of an SRp40-Specific Splicing Enhancer. Proc. Natl. Acad. Sci. USA. 1997;94:1148–1153. doi: 10.1073/pnas.94.4.1148. PubMed DOI PMC

Li X., Kazan H., Lipshitz H.D., Morris Q.D. Finding the Target Sites of RNA-Binding Proteins: Finding Target Sites of RNA-Binding Proteins. WIREs RNA. 2014;5:111–130. doi: 10.1002/wrna.1201. PubMed DOI PMC

Zhuang F., Qi Z., Duan K., Xi D., Zhu Y., Zhu H., Xiong H., He Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE. 2021;109:43–76. doi: 10.1109/JPROC.2020.3004555. DOI

Sun L., Xu K., Huang W., Yang Y.T., Li P., Tang L., Xiong T., Zhang Q.C. Predicting Dynamic Cellular Protein–RNA Interactions by Deep Learning Using in Vivo RNA Structures. Cell Res. 2021;31:495–516. doi: 10.1038/s41422-021-00476-y. PubMed DOI PMC

Wu H., Pan X., Yang Y., Shen H.-B. Recognizing Binding Sites of Poorly Characterized RNA-Binding Proteins on Circular RNAs Using Attention Siamese Network. Brief. Bioinform. 2021;22:bbab279. doi: 10.1093/bib/bbab279. PubMed DOI

Zhao S., Hamada M. Multi-ResBind: A Residual Network-Based Multi-Label Classifier for in Vivo RNA Binding Prediction and Preference Visualization. BMC Bioinform. 2021;22:554. doi: 10.1186/s12859-021-04430-y. PubMed DOI PMC

Simone L.E., Keene J.D. Mechanisms Coordinating ELAV/Hu MRNA Regulons. Curr. Opin. Genet. Dev. 2013;23:35–43. doi: 10.1016/j.gde.2012.12.006. PubMed DOI PMC

García-Mauriño S.M., Rivero-Rodríguez F., Velázquez-Cruz A., Hernández-Vellisca M., Díaz-Quintana A., De La Rosa M.A., Díaz-Moreno I. RNA Binding Protein Regulation and Cross-Talk in the Control of AU-Rich MRNA Fate. Front. Mol. Biosci. 2017;4:71. doi: 10.3389/fmolb.2017.00071. PubMed DOI PMC

Wang X., Juan L., Lv J., Wang K., Sanford J.R., Liu Y. Predicting Sequence and Structural Specificities of RNA Binding Regions Recognized by Splicing Factor SRSF1. BMC Genom. 2011;12:S8. doi: 10.1186/1471-2164-12-S5-S8. PubMed DOI PMC

Grønning A.G.B., Doktor T.K., Larsen S.J., Petersen U.S.S., Holm L.L., Bruun G.H., Hansen M.B., Hartung A.-M., Baumbach J., Andresen B.S. DeepCLIP: Predicting the Effect of Mutations on Protein–RNA Binding with Deep Learning. Nucleic Acids Res. 2020;48:7099–7118. doi: 10.1093/nar/gkaa530. PubMed DOI PMC

Ben-Bassat I., Chor B., Orenstein Y. A Deep Neural Network Approach for Learning Intrinsic Protein-RNA Binding Preferences. Bioinformatics. 2018;34:i638–i646. doi: 10.1093/bioinformatics/bty600. PubMed DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...