Nejvíce citovaný článek - PubMed ID 16357029
RNA-binding proteins are vital regulators in numerous biological processes. Their disfunction can result in diverse diseases, such as cancer or neurodegenerative disorders, making the prediction of their binding sites of high importance. Deep learning (DL) has brought about a revolution in various biological domains, including the field of protein-RNA interactions. Nonetheless, several challenges persist, such as the limited availability of experimentally validated binding sites to train well-performing DL models for the majority of proteins. Here, we present a novel training approach based on transfer learning (TL) to address the issue of limited data. Employing a sophisticated and interpretable architecture, we compare the performance of our method trained using two distinct approaches: training from scratch (SCR) and utilizing TL. Additionally, we benchmark our results against the current state-of-the-art methods. Furthermore, we tackle the challenges associated with selecting appropriate input features and determining optimal interval sizes. Our results show that TL enhances model performance, particularly in datasets with minimal training data, where satisfactory results can be achieved with just a few hundred RNA binding sites. Moreover, we demonstrate that integrating both sequence and evolutionary conservation information leads to superior performance. Additionally, we showcase how incorporating an attention layer into the model facilitates the interpretation of predictions within a biologically relevant context.
- Klíčová slova
- CLIP-seq, RNA-binding protein, deep learning, interpretation, transfer learning,
- Publikační typ
- časopisecké články MeSH
Higher order RNA structures can mask splicing signals, loop out exons, or constitute riboswitches all of which contributes to the complexity of splicing regulation. We identified a G to A substitution between branch point (BP) and 3' splice site (3'ss) of Saccharomyces cerevisiae COF1 intron, which dramatically impaired its splicing. RNA structure prediction and in-line probing showed that this mutation disrupted a stem in the BP-3'ss region. Analyses of various COF1 intron modifications revealed that the secondary structure brought about the reduction of BP to 3'ss distance and masked potential 3'ss. We demonstrated the same structural requisite for the splicing of UBC13 intron. Moreover, RNAfold predicted stable structures for almost all distant BP introns in S. cerevisiae and for selected examples in several other Saccharomycotina species. The employment of intramolecular structure to localize 3'ss for the second splicing step suggests the existence of pre-mRNA structure-based mechanism of 3'ss recognition.
- MeSH
- Ascomycota genetika MeSH
- fungální RNA chemie MeSH
- introny * MeSH
- kofilin 1 genetika MeSH
- konformace nukleové kyseliny MeSH
- místa sestřihu RNA * MeSH
- molekulární sekvence - údaje MeSH
- Saccharomyces cerevisiae - proteiny genetika MeSH
- Saccharomyces cerevisiae genetika MeSH
- sekvence nukleotidů MeSH
- sestřih RNA * MeSH
- teplota MeSH
- ubikvitin konjugující enzymy genetika MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- COF1 protein, S cerevisiae MeSH Prohlížeč
- fungální RNA MeSH
- kofilin 1 MeSH
- místa sestřihu RNA * MeSH
- Saccharomyces cerevisiae - proteiny MeSH
- UBC13 protein, S cerevisiae MeSH Prohlížeč
- ubikvitin konjugující enzymy MeSH