When will RNA get its AlphaFold moment?
Language English Country England, Great Britain Media print
Document type Journal Article
Grant support
2019/35/B/ST6/03074
National Science Centre Poland
European Molecular Biology Laboratory
Politechnika Poznańska
LM2023055
ELIXIR CZ
RVO 86652036
Akademie Věd České Republiky
PubMed
37702120
PubMed Central
PMC10570031
DOI
10.1093/nar/gkad726
PII: 7272628
Knihovny.cz E-resources
- MeSH
- Deep Learning MeSH
- Nucleic Acid Conformation MeSH
- Models, Molecular MeSH
- RNA * chemistry metabolism genetics MeSH
- RNA Folding MeSH
- Sequence Alignment MeSH
- Software MeSH
- Machine Learning MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- RNA * MeSH
The protein structure prediction problem has been solved for many types of proteins by AlphaFold. Recently, there has been considerable excitement to build off the success of AlphaFold and predict the 3D structures of RNAs. RNA prediction methods use a variety of techniques, from physics-based to machine learning approaches. We believe that there are challenges preventing the successful development of deep learning-based methods like AlphaFold for RNA in the short term. Broadly speaking, the challenges are the limited number of structures and alignments making data-hungry deep learning methods unlikely to succeed. Additionally, there are several issues with the existing structure and sequence data, as they are often of insufficient quality, highly biased and missing key information. Here, we discuss these challenges in detail and suggest some steps to remedy the situation. We believe that it is possible to create an accurate RNA structure prediction method, but it will require solving several data quality and volume issues, usage of data beyond simple sequence alignments, or the development of new less data-hungry machine learning methods.
Institute of Bioorganic Chemistry Polish Academy of Sciences Noskowskiego 12 14 61 704 Poznan Poland
See more in PubMed
NCBI Resource Coordinators Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2018; 46:D8–D13. PubMed PMC
Cech T.R., Steitz J.A., Atkins J.F.. RNA worlds: New tools for deep exploration. 2019; NY: Cold Spring Harbor Laboratory Press.
Matzov D., Bashan A., Yonath A.. A bright future for antibiotics. Ann. Rev. Biochem. 2017; 86:567–583. PubMed
n.a. Big pharma craves slice of AI-based RNA drug discovery. Nat. Biotechnol. 2023; 41:305. PubMed
Tishchenko S., Kostareva O., Gabdulkhakov A., Mikhaylina A., Nikonova E., Nevskaya N., Sarskikh A., Piendl W., Garber M., Nikonov S.. Protein–RNA affinity of ribosomal protein L1 mutants does not correlate with the number of intermolecular interactions. Acta Crystallogr. D. 2015; 71:376–386. PubMed
Levitt M. Detailed molecular model for transfer ribonucleic acid. Nature. 1969; 224:759–763. PubMed
Massire C., Westhof E.. MANIP: an interactive tool for modelling RNA. J. Mol. Graph. Model. 1998; 16:197–205. PubMed
Das R., Baker D.. Automated de novo prediction of native-like RNA tertiary structures. Proc. Natl. Acad. Sci. U.S.A. 2007; 104:14664–14669. PubMed PMC
Sharma S., Ding F., Dokholyan N.V.. iFoldRNA: three-dimensional RNA structure prediction and folding. Bioinformatics. 2008; 24:1951–1952. PubMed PMC
Jonikas M.A., Radmer R.J., Laederach A., Das R., Pearlman S., Herschlag D., Altman R.B.. Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters. RNA. 2009; 15:189–199. PubMed PMC
Boniecki M.J., Lach G., Dawson W.K., Tomala K., Lukasz P., Soltysinski T., Rother K.M., Bujnicki J.M.. SimRNA: a coarse-grained method for RNA folding simulations and 3D structure prediction. Nucleic Acids Res. 2016; 44:e63. PubMed PMC
Zhao C., Xu X., Chen S.-J.. Predicting RNA structure with Vfold. Methods Mol. Biol. 2017; 1654:3–15. PubMed PMC
Flores S.C., Wan Y., Russell R., Altman R.B.. Predicting RNA structure by multiple template homology modeling. Pac. Symp. Biocomput. 2010; 216–227. PubMed PMC
Rother M., Rother K., Puton T., Bujnicki J.M.. ModeRNA: a tool for comparative modeling of RNA 3D structure. Nucleic Acids Res. 2011; 39:4007–4022. PubMed PMC
Parisien M., Major F.. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature. 2008; 452:51–55. PubMed
Jossinet F., Ludwig T.E., Westhof E.. Assemble: an interactive graphical tool to analyze and build RNA architectures at the 2D and 3D levels. Bioinformatics. 2010; 26:2057–2059. PubMed PMC
Popenda M., Szachniuk M., Antczak M., Purzycka K.J., Lukasiak P., Bartol N., Blazewicz J., Adamiak R.W.. Automated 3D structure composition for large RNAs. Nucleic Acids Res. 2012; 40:e112. PubMed PMC
Zhao Y., Huang Y., Gong Z., Wang Y., Man J., Xiao Y.. Automated and fast building of three-dimensional RNA structures. Sci. Rep. 2012; 2:734. PubMed PMC
Townshend R. J.L., Eismann S., Watkins A.M., Rangan R., Karelina M., Das R., Dror R.O.. Geometric deep learning of RNA structure. Science. 2021; 373:1047–1051. PubMed PMC
Ramakers J., Blum C.F., König S., Harmeling S., Kollmann M.. De Novo prediction of RNA 3D structures with Deep Learning. 2021; bioRxiv doi:01 September 2021, preprint: not peer reviewed10.1101/2021.08.30.458226. PubMed DOI PMC
Pearce R., Omenn G.S., Zhang Y.. De novo RNA tertiary structure prediction at atomic resolution using geometric potentials from Deep Learning. 2022; bioRxiv doi:15 May 2022, preprint: not peer reviewed10.1101/2022.05.15.491755. DOI
Shen T., Hu Z., Peng Z., Chen J., Xiong P., Hong L., Zheng L., Wang Y., King I., Wang S.et al. .. E2Efold-3D: end-to-end deep learning method for accurate de novo RNA 3D structure prediction. 2022; arXiv doi:04 July 2022, preprint: not peer reviewedhttps://arxiv.org/abs/2207.01586.
Cruz J.A., Blanchet M.-F., Boniecki M., Bujnicki J.M., Chen S.-J., Cao S., Das R., Ding F., Dokholyan N.V., Flores S.C.et al. .. RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. RNA. 2012; 18:610–625. PubMed PMC
Miao Z., Adamiak R.W., Antczak M., Boniecki M.J., Bujnicki J., Chen S.-J., Cheng C.Y., Cheng Y., Chou F.-C., Das R.et al. .. RNA-Puzzles Round IV: 3D structure predictions of four ribozymes and two aptamers. RNA. 2020; 26:982–995. PubMed PMC
Gumna J., Antczak M., Adamiak R.W., Bujnicki J.M., Chen S.-J., Ding F., Ghosh P., Li J., Mukherjee S., Nithin C.et al. .. Computational pipeline for reference-free comparative analysis of RNA 3D structures applied to SARS-CoV-2 UTR models. Int. J. Mol. Sci. 2022; 23:9630. PubMed PMC
Parisien M., Cruz J.A., Westhof E., Major F.. New metrics for comparing and assessing discrepancies between RNA 3D structures and models. RNA. 2009; 15:1875–1885. PubMed PMC
Zok T., Popenda M., Szachniuk M.. MCQ4Structures to compute similarity of molecule structures. Cent. Eur. J. Oper Res. 2014; 22:457–473.
Wiedemann J., Zok T., Milostan M., Szachniuk M.. LCS-TA to identify similar fragments in RNA 3D structures. BMC Bioinformatics. 2017; 18:456. PubMed PMC
Gong S., Zhang C., Zhang Y.. RNA-align: quick and accurate alignment of RNA 3D structures based on size-independent TM-scoreRNA. Bioinformatics. 2019; 35:4459–4461. PubMed PMC
Magnus M., Antczak M., Zok T., Wiedemann J., Lukasiak P., Cao Y., Bujnicki J.M., Westhof E., Szachniuk M., Miao Z.. RNA-Puzzles toolkit: a computational resource of RNA 3D structure benchmark datasets, structure manipulation, and evaluation tools. Nucleic Acids Res. 2020; 48:576–588. PubMed PMC
Carrascoza F., Antczak M., Miao Z., Westhof E., Szachniuk M.. Evaluation of the stereochemical quality of predicted RNA 3D models in the RNA-Puzzles submissions. RNA. 2022; 28:250–262. PubMed PMC
Moult J., Pedersen J.T., Judson R., Fidelis K.. A large-scale experiment to assess protein structure prediction methods. Proteins. 1995; 23:ii–v. PubMed
Scheraga H.A. Calculation of polypeptide conformation. Harvey Lect. 1969; 63:99–138. PubMed
Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.et al. .. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596:583–589. PubMed PMC
AlQuraishi M. AlphaFold at CASP13. Bioinformatics. 2019; 35:4862–4865. PubMed PMC
Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.et al. .. Applying and improving AlphaFold at CASP14. Proteins: Struct. Funct. Bioinformatics. 2021; 89:1711–1721. PubMed PMC
Kryshtafovych A., Antczak M., Szachniuk M., Zok T., Kretsch R.C., Rangan R., Pham P., Das R., Robin X., Studer G.et al. .. New prediction categories in CASP15. Proteins: Struct. Funct. Bioinform. 2023; 91:1–8. PubMed PMC
Zhang J., Fei Y., Sun L., Zhang Q.C.. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat. Methods. 2022; 19:1193–1207. PubMed
Wang S., Sun S., Li Z., Zhang R., Xu J.. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 2017; 13:e1005324. PubMed PMC
Adhikari B., Hou J., Cheng J.. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics. 2018; 34:1466–1472. PubMed PMC
Hou J., Wu T., Cao R., Cheng J.. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins. 2019; 87:1165–1178. PubMed PMC
Du Z., Su H., Wang W., Ye L., Wei H., Peng Z., Anishchenko I., Baker D., Yang J.. The trRosetta server for fast and accurate protein structure prediction. Nat. Protoc. 2021; 16:5634–5651. PubMed
Kandathil S.M., Greener J.G., Lau A.M., Jones D.T.. Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins. Proc. Natl. Acad. Sci. U.S.A. 2022; 119:e2113348119. PubMed PMC
Mirdita M., Schütze K., Moriwaki Y., Heo L., Ovchinnikov S., Steinegger M.. ColabFold: making protein folding accessible to all. Nat. Methods. 2022; 19:679–682. PubMed PMC
Zhang X., Zhang B., Freddolino P.L., Zhang Y.. CR-I-TASSER: assemble protein structures from cryo-EM density maps using deep convolutional neural networks. Nat. Methods. 2022; 19:195–204. PubMed PMC
Chowdhury R., Bouatta N., Biswas S., Floristean C., Kharkar A., Roy K., Rochereau C., Ahdritz G., Zhang J., Church G.M.et al. .. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 2022; 40:1617–1623. PubMed PMC
Ferruz N., Schmidt S., Höcker B.. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022; 13:4348. PubMed PMC
Brandes N., Ofer D., Peleg Y., Rappoport N., Linial M.. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. 2022; 38:2102–2110. PubMed PMC
Suddath F.L., Quigley G.J., McPherson A., Sneden D., Kim J.J., Kim S.H., Rich A.. Three-dimensional structure of yeast phenylalanine transfer RNA at 3.0angstroms resolution. Nature. 1974; 248:20–24. PubMed
Brown R.S., Dewan J.C., Klug A.. Crystallographic and biochemical investigation of the lead(II)-catalyzed hydrolysis of yeast phenylalanine tRNA. Biochemistry. 1985; 24:4785–4801. PubMed
Westhof E., Dumas P., Moras D.. Restrained refinement of two crystalline forms of yeast aspartic acid and phenylalanine transfer RNA crystals. Acta Crystallogr. A. 1988; 44:112–123. PubMed
Tuschl T., Gohlke C., Jovin T.M., Westhof E., Eckstein F.. A three-dimensional model for the hammerhead ribozyme based on fluorescence measurements. Science. 1994; 266:785–789. PubMed
Pley H.W., Flaherty K.M., McKay D.B.. Three-dimensional structure of a hammerhead ribozyme. Nature. 1994; 372:68–74. PubMed
Cate J.H., Gooding A.R., Podell E., Zhou K., Golden B.L., Kundrot C.E., Cech T.R., Doudna J.A.. Crystal structure of a group I ribozyme domain: principles of RNA packing. Science. 1996; 273:1678–1685. PubMed
Ban N., Nissen P., Hansen J., Moore P.B., Steitz T.A.. The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science. 2000; 289:905–920. PubMed
Tocilj A., Schlünzen F., Janell D., Glühmann M., Hansen H.A., Harms J., Bashan A., Bartels H., Agmon I., Franceschi F.et al. .. The small ribosomal subunit from Thermus thermophilus at 4.5 A resolution: pattern fittings and the identification of a functional site. Proc. Natl. Acad. Sci. U.S.A. 1999; 96:14252–14257. PubMed PMC
Wimberly B.T., Brodersen D.E., Clemons W.M., Morgan-Warren R.J., Carter A.P., Vonrhein C., Hartsch T., Ramakrishnan V.. Structure of the 30S ribosomal subunit. Nature. 2000; 407:327–339. PubMed
Burley S.K., Berman H.M., Bhikadiya C., Bi C., Chen L., Costanzo L.D., Christie C., Dalenberg K., Duarte J.M., Dutta S.et al. .. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2018; 47:D464–D474. PubMed PMC
Adamczyk B., Antczak M., Szachniuk M.. RNAsolo: a repository of cleaned PDB-derived RNA 3D structures. Bioinformatics. 2022; 38:3668–3670. PubMed PMC
Lescoute A., Westhof E.. Topology of three-way junctions in folded RNAs. RNA. 2006; 12:83–93. PubMed PMC
Laing C., Schlick T.. Analysis of four-way junctions in RNA structures. J Mol. Biol. 2009; 390:547–559. PubMed PMC
Wiedemann J., Kaczor J., Milostan M., Zok T., Blazewicz J., Szachniuk M., Antczak M.. RNAloops: a database of RNA multiloops. Bioinformatics. 2022; 38:4200–4205. PubMed PMC
Stombaugh J., Zirbel C.L., Westhof E., Leontis N.B.. Frequency and isostericity of RNA base pairs. Nucleic Acids Res. 2009; 37:2294–2312. PubMed PMC
Leontis N.B., Westhof E.. A common motif organizes the structure of multi-helix loops in 16 S and 23 S ribosomal RNAs. J. Mol. Biol. 1998; 283:571–583. PubMed
Mir A., Chen J., Robinson K., Lendy E., Goodman J., Neau D., Golden B.L.. Two divalent metal ions and conformational changes play roles in the hammerhead ribozyme cleavage reaction. Biochemistry. 2015; 54:6369–6381. PubMed PMC
Gendron P., Lemieux S., Major F.. Quantitative analysis of nucleic acid three-dimensional structures. J. Mol. Biol. 2001; 308:919–936. PubMed
Yang H., Jossinet F., Leontis N., Chen L., Westbrook J., Berman H., Westhof E.. Tools for the automatic identification and classification of RNA base pairs. Nucleic Acids Res. 2003; 31:3450–3460. PubMed PMC
Sarver M., Zirbel C.L., Stombaugh J., Mokdad A., Leontis N.B.. FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. J. Math. Biol. 2008; 56:215–252. PubMed PMC
Walen T., Chojnowski G., Gierski P., Bujnicki J.M.. ClaRNA: a classifier of contacts in RNA 3D structures based on a comparative analysis of various classification schemes. Nucleic Acids Res. 2014; 42:e151. PubMed PMC
Zok T., Antczak M., Zurkowski M., Popenda M., Blazewicz J., Adamiak R.W., Szachniuk M.. RNApdbee 2.0: multifunctional tool for RNA structure annotation. Nucleic Acids Res. 2018; 46:W30–W35. PubMed PMC
Danaee P., Rouches M., Wiley M., Deng D., Huang L., Hendrix D.. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res. 2018; 46:5381–5394. PubMed PMC
Bottaro S., Bussi G., Pinamonti G., Reißer S., Boomsma W., Lindorff-Larsen K.. Barnaba: software for analysis of nucleic acid structures and trajectories. RNA. 2019; 25:219–231. PubMed PMC
Roy P., Bhattacharyya D.. Contact networks in RNA: a structural bioinformatics study with a new tool. J. Comput. Aided Mol. Des. 2022; 36:131–140. PubMed
Lu X.-J., Bussemaker H.J., Olson W.K.. DSSR: an integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res. 2015; 43:e142. PubMed PMC
Schneider B., Bruno I., Burley S.K., Case D.A., Černý J., Das R., Egli M., Emsley P., Feng Z., Jaskolski M.et al. .. Nucleic acid valence geometry working group. Int. Union Crystallogr. Newslett. 2020; 28:https://www.iucr.org/news/newsletter/volume-28/number-4/nucleic-acid-valence-geometry-working-group.
Kowiel M., Brzezinski D., Jaskolski M.. Conformation-dependent restraints for polynucleotides: I. Clustering of the geometry of the phosphodiester group. Nucleic Acids Res. 2016; 44:8479–8489. PubMed PMC
Gilski M., Zhao J., Kowiel M., Brzezinski D., Turner D.H., Jaskolski M.. Accurate geometrical restraints for Watson–Crick base pairs. Acta Crystallogr. B Struct. Sci. Cryst. Eng. Mater. 2019; 75:235–245. PubMed PMC
Kowiel M., Brzezinski D., Gilski M., Jaskolski M.. Conformation-dependent restraints for polynucleotides: the sugar moiety. Nucleic Acids Res. 2020; 48:962–973. PubMed PMC
Kim S.-H., Berman H.M., Seeman N.C., Newton M.D.. Seven basic conformations of nucleic acid structural units. Acta Crystallogr. B. 1973; 29:703–710.
Murray L. J.W., Arendall 3rd W.B., Richardson D.C., Richardson J.S.. RNA backbone is rotameric. Proc. Natl. Acad. Sci. U.S.A. 2003; 13904–13909. PubMed PMC
Hershkovitz E., Tannenbaum E., Howerton S.B., Sheth A., Tannenbaum A., Williams L.D.. Automated identification of RNA conformational motifs: theory and application to the HM LSU 23S rRNA. Nucleic Acids Res. 2003; 31:6249–6257. PubMed PMC
Schneider B., Morávek Z., Berman H.M.. RNA conformational classes. Nucleic Acids Res. 2004; 32:1666–1677. PubMed PMC
Svozil D., Kalina J., Omelka M., Schneider B.. DNA conformations and their sequence preferences. Nucleic Acids Res. 2008; 36:3690–3706. PubMed PMC
Černý J., Božíková P., Svoboda J., Schneider B.. A unified dinucleotide alphabet describing both RNA and DNA structures. Nucleic Acids Res. 2020; 48:6367–6381. PubMed PMC
Kozomara A., Birgaoanu M., Griffiths-Jones S.. miRBase: from microRNA sequences to function. Nucleic Acids Res. 2019; 47:D155–D162. PubMed PMC
Fromm B., Domanska D., Høye E., Ovchinnikov V., Kang W., Aparicio-Puerta E., Johansen M., Flatmark K., Mathelier A., Hovig E.et al. .. MirGeneDB 2.0: the metazoan microRNA complement. Nucleic Acids Res. 2020; 48:D132–D141. PubMed PMC
Sonnhammer E.L., Eddy S.R., Durbin R.. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 1997; 28:405–420. PubMed
Griffiths-Jones S., Bateman A., Marshall M., Khanna A., Eddy S.R.. Rfam: an RNA family database. Nucleic Acids Res. 2003; 31:439–441. PubMed PMC
Rothschild D., Susanto T.T., Spence J.P., Genuth N.R., Sinnott-Armstrong N., Pritchard J.K., Barna M.. A comprehensive rRNA variation atlas in health and disease. 2023; bioRxiv doi:02 February 2023, preprint: not peer reviewed10.1101/2023.01.30.526360. PubMed DOI
McCulloch W.S., Pitts W.. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943; 5:115–133. PubMed
Krizhevsky A., Sutskever I., Hinton G.E.. Pereira F., Burges C.J., Bottou L., Weinberger K.Q.. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems. 2012; 25:
Dean J., Corrado G., Monga R., Chen K., Devin M., Mao M., Ranzato M., Senior A., Tucker P., Yang K.et al. .. Large scale distributed deep networks. Adv. Neural. Inf. Process Syst. 2012; 25:1223–1231.
Zhang C., Zhang Y., Pyle A.M.. rMSA: a sequence search and alignment algorithm to improve RNA structure modeling. J. Mol. Biol. 2023; 435:167904. PubMed
Darwin Tree of Life Project Consortium Sequence locally, think globally: The Darwin Tree of Life Project. Proc. Natl. Acad. Sci. USA. 2022; 119:e2115642118. PubMed PMC
Gupta P.K. Earth Biogenome Project: present status and future plans. Trends Genet. 2022; 38:811–820. PubMed
Gao W., Yang A., Rivas E.. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life. 2022; 75:471–492. PubMed PMC
Ponce-Salvatierra A., Merdas Astha K., Nithin C., Ghosh P., Mukherjee S., Bujnicki J.M.. Computational modeling of RNA 3D structure based on experimental data. Biosci. Rep. 2019; 39:BSR20180430. PubMed PMC
Spitale R.C., Incarnato D.. Probing the dynamic RNA structurome and its functions. Nat. Rev. Genet. 2023; 24:178–196. PubMed PMC