Machine Learning-Guided Protein Engineering
Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection
Typ dokumentu časopisecké články, přehledy
PubMed
37942269
PubMed Central
PMC10629210
DOI
10.1021/acscatal.3c02743
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Zobrazit více v PubMed
Wu S.; Snajdrova R.; Moore J. C.; Baldenius K.; Bornscheuer U. T. Biocatalysis: Enzymatic Synthesis for Industrial Applications. Angew. Chem., Int. Ed. Engl. 2021, 60 (1), 88–119. 10.1002/anie.202006648. PubMed DOI PMC
Bell E. L.; Finnigan W.; France S. P.; Green A. P.; Hayes M. A.; Hepworth L. J.; Lovelock S. L.; Niikura H.; Osuna S.; Romero E.; Ryan K. S.; Turner N. J.; Flitsch S. L. Biocatalysis. Nat. Rev. Methods Primers 2021, 1, 46.10.1038/s43586-021-00044-z. DOI
Silvestre B. S.; Ţîrcă D. M. Innovations for Sustainable Development: Moving toward a Sustainable Future. J. Clean. Prod. 2019, 208, 325–332. 10.1016/j.jclepro.2018.09.244. DOI
Tiso T.; Winter B.; Wei R.; Hee J.; de Witt J.; Wierckx N.; Quicker P.; Bornscheuer U. T.; Bardow A.; Nogales J.; Blank L. M. The Metabolic Potential of Plastics as Biotechnological Carbon Sources - Review and Targets for the Future. Metab. Eng. 2022, 71, 77–98. 10.1016/j.ymben.2021.12.006. PubMed DOI
Pimviriyakul P.; Wongnate T.; Tinikul R.; Chaiyen P. Microbial Degradation of Halogenated Aromatics: Molecular Mechanisms and Enzymatic Reactions. Microb. Biotechnol. 2020, 13 (1), 67–86. 10.1111/1751-7915.13488. PubMed DOI PMC
Marques S. M.; Planas-Iglesias J.; Damborsky J. Web-Based Tools for Computational Enzyme Design. Curr. Opin. Struct. Biol. 2021, 69, 19–34. 10.1016/j.sbi.2021.01.010. PubMed DOI
Chang C.; Deringer V. L.; Katti K. S.; Van Speybroeck V.; Wolverton C. M. Simulations in the Era of Exascale Computing. Nat Rev Mater 2023, 8 (5), 309–313. 10.1038/s41578-023-00540-6. PubMed DOI PMC
Pyzer-Knapp E. O.; Pitera J. W.; Staar P. W. J.; Takeda S.; Laino T.; Sanders D. P.; Sexton J.; Smith J. R.; Curioni A. Accelerating Materials Discovery Using Artificial Intelligence, High Performance Computing and Robotics. npj Comput. Mater. 2022, 8, 84.10.1038/s41524-022-00765-z. DOI
Singh V.; Patra S.; Murugan N. A.; Toncu D.-C.; Tiwari A. Recent Trends in Computational Tools and Data-Driven Modeling for Advanced Materials. Mater. Adv. 2022, 3 (10), 4069–4087. 10.1039/D2MA00067A. DOI
Greener J. G.; Kandathil S. M.; Moffat L.; Jones D. T. A Guide to Machine Learning for Biologists. Nat. Rev. Mol. Cell Biol. 2022, 23 (1), 40–55. 10.1038/s41580-021-00407-0. PubMed DOI
Beller M.; Bender M.; Bornscheuer U. T.; Schunk S. Catalysis – Far from Being a Mature Technology. Chem. Ing. Tech. 2022, 94 (11), 1559–1559. 10.1002/cite.202271102. DOI
Oza V. H.; Whitlock J. H.; Wilk E. J.; Uno-Antonison A.; Wilk B.; Gajapathy M.; Howton T. C.; Trull A.; Ianov L.; Worthey E. A.; Lasseigne B. N. Ten Simple Rules for Using Public Biological Data for Your Research. PLoS Comput. Biol. 2023, 19 (1), e101074910.1371/journal.pcbi.1010749. PubMed DOI PMC
Mazurenko S.; Prokop Z.; Damborsky J. Machine Learning in Enzyme Engineering. ACS Catal. 2020, 10 (2), 1210–1223. 10.1021/acscatal.9b04321. DOI
Strokach A.; Kim P. M. Deep Generative Modeling for Protein Design. Curr. Opin. Struct. Biol. 2022, 72, 226–236. 10.1016/j.sbi.2021.11.008. PubMed DOI
Ding W.; Nakai K.; Gong H. Protein Design via Deep Learning. Brief. Bioinform. 2022, 23 (3), bbac10210.1093/bib/bbac102. PubMed DOI PMC
Pan X.; Kortemme T. Recent Advances in de Novo Protein Design: Principles, Methods, and Applications. J. Biol. Chem. 2021, 296, 10055810.1016/j.jbc.2021.100558. PubMed DOI PMC
Chandra A.; Tünnermann L.; Löfstedt T.; Gratz R. Transformer-Based Deep Learning for Predicting Protein Properties in the Life Sciences. Elife 2023, 12, e8281910.7554/eLife.82819. PubMed DOI PMC
Lin T.; Wang Y.; Liu X.; Qiu X. A Survey of Transformers. AI Open 2022, 3, 111–132. 10.1016/j.aiopen.2022.10.001. DOI
Zhang X.-M.; Liang L.; Liu L.; Tang M.-J. Graph Neural Networks and Their Current Applications in Bioinformatics. Front. Genet. 2021, 12, 69004910.3389/fgene.2021.690049. PubMed DOI PMC
Bronstein M. M.; Bruna J.; Cohen T.; Veličković P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv [cs.LG] 2021, 10.48550/arXiv.2104.13478. DOI
Alzubaidi L.; Zhang J.; Humaidi A. J.; Al-Dujaili A.; Duan Y.; Al-Shamma O.; Santamaría J.; Fadhel M. A.; Al-Amidie M.; Farhan L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J Big Data 2021, 8, 53.10.1186/s40537-021-00444-8. PubMed DOI PMC
Calin O.Deep Learning Architectures; Springer, 2020.
Goodfellow I.; Bengio Y.; Courville A.. Deep Learning; MIT Press, 2016.
Bordin N.; Dallago C.; Heinzinger M.; Kim S.; Littmann M.; Rauer C.; Steinegger M.; Rost B.; Orengo C. Novel Machine Learning Approaches Revolutionize Protein Knowledge. Trends Biochem. Sci. 2023, 48 (4), 345–359. 10.1016/j.tibs.2022.11.001. PubMed DOI PMC
Mowbray M.; Savage T.; Wu C.; Song Z.; Cho B. A.; Del Rio-Chanona E. A.; Zhang D. Machine Learning for Biochemical Engineering: A Review. Biochemical Eng. J. 2021, 172, 108054.10.1016/j.bej.2021.108054. DOI
Hon J.; Marusiak M.; Martinek T.; Kunka A.; Zendulka J.; Bednar D.; Damborsky J. SoluProt: Prediction of Soluble Protein Expression in Escherichia Coli. Bioinformatics 2021, 37 (1), 23–28. 10.1093/bioinformatics/btaa1102. PubMed DOI PMC
Wu K. E.; Yang K. K.; van den Berg R.; Zou J. Y.; Lu A. X.; Amini A. P. Protein Structure Generation via Folding Diffusion. arXiv [q-bio.BM] 2022, 10.48550/arXiv.2209.15611. PubMed DOI PMC
Watson J. L.; Juergens D.; Bennett N. R.; Trippe B. L.; Yim J.; Eisenach H. E.; Ahern W.; Borst A. J.; Ragotte R. J.; Milles L. F.; Wicky B. I. M.; Hanikel N.; Pellock S. J.; Courbet A.; Sheffler W.; Wang J.; Venkatesh P.; Sappington I.; Torres S. V.; Lauko A.; De Bortoli V.; Mathieu E.; Ovchinnikov S.; Barzilay R.; Jaakkola T. S.; DiMaio F.; Baek M.; Baker D. De Novo Design of Protein Structure and Function with RFdiffusion. Nature 2023, 620, 1089.10.1038/s41586-023-06415-8. PubMed DOI PMC
Corso G.; Stärk H.; Jing B.; Barzilay R.; Jaakkola T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. arXiv [q-bio.BM] 2022, 10.48550/arXiv.2210.01776. DOI
Guo Z.; Liu J.; Wang Y.; Chen M.; Wang D.; Xu D.; Cheng J. Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action. arXiv [cs.LG] 2023, 10.48550/arXiv.2302.10907. DOI
Shroff R.; Cole A. W.; Diaz D. J.; Morrow B. R.; Donnell I.; Annapareddy A.; Gollihar J.; Ellington A. D.; Thyer R. Discovery of Novel Gain-of-Function Mutations Guided by Structure-Based Deep Learning. ACS Synth. Biol. 2020, 9 (11), 2927–2935. 10.1021/acssynbio.0c00345. PubMed DOI
Zhang Z.; Xu M.; Chenthamarakshan V.; Lozano A.; Das P.; Tang J.. Enhancing Protein Language Models with Structure-Based Encoder and Pre-Training. arXiv (Quantitative Biology.Quantitative Methods), March 11, 2023, 2303.06275, ver. 1. 10.48550/arXiv.2303.06275 DOI
Diaz D. J.; Gong C.; Ouyang-Zhang J.; Loy J. M.; Wells J.; Yang D.; Ellington A. D.; Dimakis A.; Klivans A. R.. Stability Oracle: A Structure-Based Graph-Transformer for Identifying Stabilizing Mutations. bioRxiv (Biochemistry), March 22, 2023, 2023.05.15.540857. 10.1101/2023.05.15.540857. PubMed DOI PMC
Ferruz N.; Heinzinger M.; Akdel M.; Goncearenco A.; Naef L.; Dallago C. From Sequence to Function through Structure: Deep Learning for Protein Design. Comput. Struct. Biotechnol. J. 2023, 21, 238–250. 10.1016/j.csbj.2022.11.014. PubMed DOI PMC
Madani A.; Krause B.; Greene E. R.; Subramanian S.; Mohr B. P.; Holton J. M.; Olmos J. L. Jr; Xiong C.; Sun Z. Z.; Socher R.; Fraser J. S.; Naik N. Large Language Models Generate Functional Protein Sequences across Diverse Families. Nat. Biotechnol. 2023, 41, 1099.10.1038/s41587-022-01618-2. PubMed DOI PMC
Li Y.; Rezaei M. A.; Li C.; Li X.. DeepAtom: A Framework for Protein-Ligand Binding Affinity Prediction. In Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, November 18–21, 2019; IEEE, 2019; pp 303–310.
Park S.; Seok C. GalaxyWater-CNN: Prediction of Water Positions on the Protein Structure by a 3D-Convolutional Neural Network. J. Chem. Inf. Model. 2022, 62 (13), 3157–3168. 10.1021/acs.jcim.2c00306. PubMed DOI
Ramesh A.; Dhariwal P.; Nichol A.; Chu C.; Chen M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv [cs.CV] 2022, 10.48550/ARXIV.2204.06125. DOI
Rombach R.; Blattmann A.; Lorenz D.; Esser P.; Ommer B. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv [cs.CV] 2021, 10.48550/arXiv.2112.10752. DOI
Schneuing A.; Du Y.; Harris C.; Jamasb A.; Igashov I.; Du W.; Blundell T.; Lió P.; Gomes C.; Welling M.; Bronstein M.; Correia B. Structure-Based Drug Design with Equivariant Diffusion Models. arXiv [q-bio.BM] 2022, 10.48550/arXiv.2210.13695. DOI
Igashov I.; Stärk H.; Vignac C.; Satorras V. G.; Frossard P.; Welling M.; Bronstein M. M.; Correia B.. Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design. OpenReview, February 1, 2023. https://openreview.net/forum?id=cnsHSSLnHVV.
Yang A.; Nagrani A.; Seo P. H.; Miech A.; Pont-Tuset J.; Laptev I.; Sivic J.; Schmid C.. Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, June 2022, 2023; Computer Vision Foundation, 2023; pp 10714–10726.
Huang C.; Wu Z.; Wen J.; Xu Y.; Jiang Q.; Wang Y. Abnormal Event Detection Using Deep Contrastive Learning for Intelligent Video Surveillance System. IEEE Trans. Ind. Inf. 2022, 18 (8), 5171–5179. 10.1109/TII.2021.3122801. DOI
Ho J.; Chan W.; Saharia C.; Whang J.; Gao R.; Gritsenko A.; Kingma D. P.; Poole B.; Norouzi M.; Fleet D. J.; Salimans T. Imagen Video: High Definition Video Generation with Diffusion Models. arXiv [cs.CV] 2022, 10.48550/arXiv.2210.02303. DOI
Villegas R.; Babaeizadeh M.; Kindermans P.-J.; Moraldo H.; Zhang H.; Saffar M. T.; Castro S.; Kunze J.; Erhan D.. Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=vOEXS39nOF
Singer U.; Polyak A.; Hayes T.; Yin X.; An J.; Zhang S.; Hu Q.; Yang H.; Ashual O.; Gafni O.; Parikh D.; Gupta S.; Taigman Y.. Make-A-Video: Text-to-Video Generation without Text-Video Data. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=nJfylDvgzlq
Hung M.; Lauren E.; Hon E. S.; Birmingham W. C.; Xu J.; Su S.; Hon S. D.; Park J.; Dang P.; Lipsky M. S. Social Network Analysis of COVID-19 Sentiments: Application of Artificial Intelligence. J. Med. Internet Res. 2020, 22 (8), e2259010.2196/22590. PubMed DOI PMC
Bryant P.; Pozzati G.; Elofsson A. Improved Prediction of Protein-Protein Interactions Using AlphaFold2. Nat. Commun. 2022, 13, 1265.10.1038/s41467-022-28865-w. PubMed DOI PMC
Muzio G.; O’Bray L.; Borgwardt K. Biological Network Analysis with Deep Learning. Brief. Bioinform. 2021, 22 (2), 1515–1530. 10.1093/bib/bbaa257. PubMed DOI PMC
Chen J.; Zheng S.; Zhao H.; Yang Y. Structure-Aware Protein Solubility Prediction from Sequence through Graph Convolutional Network and Predicted Contact Map. J. Cheminform. 2021, 13, 7.10.1186/s13321-021-00488-1. PubMed DOI PMC
Jiang J.; Wang R.; Wei G.-W. GGL-Tox: Geometric Graph Learning for Toxicity Prediction. J. Chem. Inf. Model. 2021, 61 (4), 1691–1700. 10.1021/acs.jcim.0c01294. PubMed DOI PMC
Hu W.; Fey M.; Zitnik M.; Dong Y.; Ren H.; Liu B.; Catasta M.; Leskovec J.; Larochelle H.; Ranzato M.; Hadsell R.; Balcan M. F.; Lin H. Open Graph Benchmark: Datasets for Machine Learning on Graphs. Adv. Neural Inf. Process. Syst. 2020, 33, 22118–22133.
Kawashima S.; Pokarowski P.; Pokarowska M.; Kolinski A.; Katayama T.; Kanehisa M. AAindex: Amino Acid Index Database, Progress Report 2008. Nucleic Acids Res. 2007, 36, D202–D205. 10.1093/nar/gkm998. PubMed DOI PMC
ElAbd H.; Bromberg Y.; Hoarfrost A.; Lenz T.; Franke A.; Wendorff M. Amino Acid Encoding for Deep Learning Applications. BMC Bioinformatics 2020, 21, 235.10.1186/s12859-020-03546-x. PubMed DOI PMC
Raimondi D.; Orlando G.; Vranken W. F.; Moreau Y. Exploring the Limitations of Biophysical Propensity Scales Coupled with Machine Learning for Protein Sequence Analysis. Sci. Rep. 2019, 9, 16932.10.1038/s41598-019-53324-w. PubMed DOI PMC
Kandathil S. M.; Greener J. G.; Lau A. M.; Jones D. T. Ultrafast End-to-End Protein Structure Prediction Enables High-Throughput Exploration of Uncharacterized Proteins. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (4), e211334811910.1073/pnas.2113348119. PubMed DOI PMC
Fasoulis R.; Paliouras G.; Kavraki L. E. Graph Representation Learning for Structural Proteomics. Emerg Top Life Sci 2021, 5 (6), 789–802. 10.1042/ETLS20210225. PubMed DOI PMC
Hermosilla P.; Schäfer M.; Lang M.; Fackelmann G.; Vázquez P. P.; Kozlíková B.; Krone M.; Ritschel T.; Ropinski T.. Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures. Ninth International Conference on Learning Representations, May 3–7, 2021; OpenReview, 2021.
Batzner S.; Musaelian A.; Sun L.; Geiger M.; Mailoa J. P.; Kornbluth M.; Molinari N.; Smidt T. E.; Kozinsky B. E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials. Nat. Commun. 2022, 13, 2453.10.1038/s41467-022-29939-5. PubMed DOI PMC
Gligorijević V.; Renfrew P. D.; Kosciolek T.; Leman J. K.; Berenberg D.; Vatanen T.; Chandler C.; Taylor B. C.; Fisk I. M.; Vlamakis H.; Xavier R. J.; Knight R.; Cho K.; Bonneau R. Structure-Based Protein Function Prediction Using Graph Convolutional Networks. Nat. Commun. 2021, 12, 3168.10.1038/s41467-021-23303-9. PubMed DOI PMC
Gao Z.; Jiang C.; Zhang J.; Jiang X.; Li L.; Zhao P.; Yang H.; Huang Y.; Li J. Hierarchical Graph Learning for Protein-Protein Interaction. Nat. Commun. 2023, 14, 1093.10.1038/s41467-023-36736-1. PubMed DOI PMC
Vaswani A.; Shazeer N.; Parmar N.; Uszkoreit J.; Jones L.; Gomez A. N.; Kaiser Ł.; Polosukhin I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5999.
Fuchs F.; Worrall D.; Fischer V.; Welling M.; Larochelle H.; Ranzato M.; Hadsell R.; Balcan M. F.; Lin H. SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks. Adv. Neural Inf. Process. Syst. 2020, 33, 1970–1981.
Detlefsen N. S.; Hauberg S.; Boomsma W. Learning Meaningful Representations of Protein Sequences. Nat. Commun. 2022, 13, 1914.10.1038/s41467-022-29443-w. PubMed DOI PMC
Rives A.; Meier J.; Sercu T.; Goyal S.; Lin Z.; Liu J.; Guo D.; Ott M.; Zitnick C. L.; Ma J.; Fergus R. Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences. Proc. Natl. Acad. Sci. U. S. A. 2021, 118 (15), e201623911810.1073/pnas.2016239118. PubMed DOI PMC
Meier J.; Rao R.; Verkuil R.; Liu J.; Sercu T.; Rives A.; Ranzato M.; Beygelzimer A.; Dauphin Y.; Liang P. S.; Vaughan J. W. Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function. Adv. Neural Inf. Process. Syst. 2021, 34, 29287–29303.
Lin Z.; Akin H.; Rao R.; Hie B.; Zhu Z.; Lu W.; Smetanin N.; Verkuil R.; Kabeli O.; Shmueli Y.; Dos Santos Costa A.; Fazel-Zarandi M.; Sercu T.; Candido S.; Rives A. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379 (6637), 1123–1130. 10.1126/science.ade2574. PubMed DOI
Zhang Z.; Xu M.; Jamasb A.; Chenthamarakshan V.; Lozano A.; Das P.; Tang J.. Protein Representation Learning by Geometric Structure Pretraining. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=to3qCB3tOh9
Fowler D. M.; Fields S. Deep Mutational Scanning: A New Style of Protein Science. Nat. Methods 2014, 11 (8), 801–807. 10.1038/nmeth.3027. PubMed DOI PMC
Vanella R.; Kovacevic G.; Doffini V.; Fernández de Santaella J.; Nash M. A. High-Throughput Screening, next Generation Sequencing and Machine Learning: Advanced Methods in Enzyme Engineering. Chem. Commun. 2022, 58 (15), 2455–2467. 10.1039/D1CC04635G. PubMed DOI PMC
Morrison K. L.; Weiss G. A. Combinatorial Alanine-Scanning. Curr. Opin. Chem. Biol. 2001, 5 (3), 302–307. 10.1016/S1367-5931(00)00206-4. PubMed DOI
Brown T.; Mann B.; Ryder N.; Subbiah M.; Kaplan J. D.; Dhariwal P.; Neelakantan A.; Shyam P.; Sastry G.; Askell A. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.
OpenAI . GPT-4 Technical Report. arXiv (Computer Science.Computation and Language), March 27, 2023, 2303.08774. https://arxiv.org/abs/2303.08774.
Luo R.; Sun L.; Xia Y.; Qin T.; Zhang S.; Poon H.; Liu T.-Y. BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining. Brief. Bioinform. 2022, 23 (6), bbac40910.1093/bib/bbac409. PubMed DOI
Zhu Z.; Shi C.; Zhang Z.; Liu S.; Xu M.; Yuan X.; Zhang Y.; Chen J.; Cai H.; Lu J.; Ma C.; Liu R.; Xhonneux L.-P.; Qu M.; Tang J. TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery. arXiv [cs.LG] 2022, 10.48550/arXiv.2202.08320. DOI
Siedhoff N. E.; Illig A.-M.; Schwaneberg U.; Davari M. D. PyPEF-An Integrated Framework for Data-Driven Protein Engineering. J. Chem. Inf. Model. 2021, 61 (7), 3463–3476. 10.1021/acs.jcim.1c00099. PubMed DOI
Draizen E. J.; Murillo L. F. R.; Readey J.; Mura C.; Bourne P. E.. Prop3D: A Flexible, Python-Based Platform for Machine Learning with Protein Structural Properties and Biophysical Data. bioRxiv, 2022, 2022.12.27.522071. 10.1101/2022.12.27.522071. PubMed DOI PMC
Berman H. M.; Westbrook J.; Feng Z.; Gilliland G.; Bhat T. N.; Weissig H.; Shindyalov I. N.; Bourne P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235–242. 10.1093/nar/28.1.235. PubMed DOI PMC
Chothia C.; Lesk A. M. The Relation between the Divergence of Sequence and Structure in Proteins. EMBO J. 1986, 5 (4), 823–826. 10.1002/j.1460-2075.1986.tb04288.x. PubMed DOI PMC
van Kempen M.; Kim S. S.; Tumescheit C.; Mirdita M.; Lee J.; Gilchrist C. L. M.; Söding J.; Steinegger M. Fast and Accurate Protein Structure Search with Foldseek. Nat. Biotechnol. 2023, 10.1038/s41587-023-01773-0. PubMed DOI PMC
Brookes D.; Park H.; Listgarten J.. Conditioning by Adaptive Sampling for Robust Design. In Proceedings of the 36th International Conference on Machine Learning; Chaudhuri K., Salakhutdinov R., Eds.; Proceedings of Machine Learning Research, Vol. 97; PMLR, 2019; pp 773–782.
Sinai S.; Wang R.; Whatley A.; Slocum S.; Locane E.; Kelsic E. D.. AdaLead: A Simple and Robust Adaptive Greedy Search Algorithm for Sequence Design. arXiv (Computer Science.Machine Learning), October 5, 2020, 2010.02141, ver. 1.10.48550/arXiv.2010.02141 DOI
Ren Z.; Li J.; Ding F.; Zhou Y.; Ma J.; Peng J.. Proximal Exploration for Model-Guided Protein Sequence Design. In Proceedings of the 39th International Conference on Machine Learning; Chaudhuri K., Jegelka S., Song L., Szepesvari C., Niu G., Sabato S., Eds.; Proceedings of Machine Learning Research, Vol. 162; PMLR, 2022; pp 18520–18536.
Lipsh-Sokolik R.; Khersonsky O.; Schröder S. P.; de Boer C.; Hoch S.-Y.; Davies G. J.; Overkleeft H. S.; Fleishman S. J. Combinatorial Assembly and Design of Enzymes. Science 2023, 379 (6628), 195–201. 10.1126/science.ade9434. PubMed DOI
Yu T.; Boob A. G.; Volk M. J.; Liu X.; Cui H.; Zhao H. Machine Learning-Enabled Retrobiosynthesis of Molecules. Nat. Catal. 2023, 6, 137.10.1038/s41929-022-00909-w. DOI
Mistry J.; Chuguransky S.; Williams L.; Qureshi M.; Salazar G. A.; Sonnhammer E. L. L.; Tosatto S. C. E.; Paladin L.; Raj S.; Richardson L. J.; Finn R. D.; Bateman A. Pfam: The Protein Families Database in 2021. Nucleic Acids Res. 2021, 49 (D1), D412–D419. 10.1093/nar/gkaa913. PubMed DOI PMC
Pandurangan A. P.; Stahlhacke J.; Oates M. E.; Smithers B.; Gough J. The SUPERFAMILY 2.0 Database: A Significant Proteome Update and a New Webserver. Nucleic Acids Res. 2019, 47 (D1), D490–D494. 10.1093/nar/gky1130. PubMed DOI PMC
Sillitoe I.; Bordin N.; Dawson N.; Waman V. P.; Ashford P.; Scholes H. M.; Pang C. S. M.; Woodridge L.; Rauer C.; Sen N.; Abbasian M.; Le Cornu S.; Lam S. D.; Berka K.; Varekova I. H.; Svobodova R.; Lees J.; Orengo C. A. CATH: Increased Structural Coverage of Functional Space. Nucleic Acids Res. 2021, 49 (D1), D266–D273. 10.1093/nar/gkaa1079. PubMed DOI PMC
Alcántara R.; Axelsen K. B.; Morgat A.; Belda E.; Coudert E.; Bridge A.; Cao H.; de Matos P.; Ennis M.; Turner S.; Owen G.; Bougueleret L.; Xenarios I.; Steinbeck C. Rhea--a Manually Curated Resource of Biochemical Reactions. Nucleic Acids Res. 2012, 40, D754–D760. 10.1093/nar/gkr1126. PubMed DOI PMC
Schomburg I.; Chang A.; Schomburg D. BRENDA, Enzyme Data and Metabolic Information. Nucleic Acids Res. 2002, 30 (1), 47–49. 10.1093/nar/30.1.47. PubMed DOI PMC
Wittig U.; Rey M.; Weidemann A.; Kania R.; Müller W. SABIO-RK: An Updated Resource for Manually Curated Biochemical Reaction Kinetics. Nucleic Acids Res. 2018, 46 (D1), D656–D660. 10.1093/nar/gkx1065. PubMed DOI PMC
Wishart D. S.; Li C.; Marcu A.; Badran H.; Pon A.; Budinski Z.; Patron J.; Lipton D.; Cao X.; Oler E.; Li K.; Paccoud M.; Hong C.; Guo A. C.; Chan C.; Wei W.; Ramirez-Gaona M. PathBank: A Comprehensive Pathway Database for Model Organisms. Nucleic Acids Res. 2020, 48 (D1), D470–D478. 10.1093/nar/gkz861. PubMed DOI PMC
Hafner J.; MohammadiPeyhani H.; Sveshnikova A.; Scheidegger A.; Hatzimanikatis V. Updated ATLAS of Biochemistry with New Metabolites and Improved Enzyme Prediction Power. ACS Synth. Biol. 2020, 9 (6), 1479–1482. 10.1021/acssynbio.0c00052. PubMed DOI PMC
Ganter M.; Bernard T.; Moretti S.; Stelling J.; Pagni M. Metanetx.org: A Website and Repository for Accessing, Analysing and Manipulating Metabolic Networks. Bioinformatics 2013, 29 (6), 815–816. 10.1093/bioinformatics/btt036. PubMed DOI PMC
Bairoch A. The ENZYME Database in 2000. Nucleic Acids Res. 2000, 28 (1), 304–305. 10.1093/nar/28.1.304. PubMed DOI PMC
McDonald A. G.; Tipton K. F. Enzyme Nomenclature and Classification: The State of the Art. FEBS J. 2023, 290 (9), 2214–2231. 10.1111/febs.16274. PubMed DOI
Probst D.; Manica M.; Nana Teukam Y. G.; Castrogiovanni A.; Paratore F.; Laino T. Biocatalysed Synthesis Planning Using Data-Driven Learning. Nat. Commun. 2022, 13, 964.10.1038/s41467-022-28536-w. PubMed DOI PMC
Heid E.; Probst D.; Green W. H.; Madsen G. K. H. EnzymeMap: Curation, Validation and Data-Driven Prediction of Enzymatic Reactions. ChemRxiv 2023, 10.26434/chemrxiv-2023-jzw9w. PubMed DOI PMC
Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596 (7873), 583–589. 10.1038/s41586-021-03819-2. PubMed DOI PMC
Bileschi M. L.; Belanger D.; Bryant D. H.; Sanderson T.; Carter B.; Sculley D.; Bateman A.; DePristo M. A.; Colwell L. J. Using Deep Learning to Annotate the Protein Universe. Nat. Biotechnol. 2022, 40 (6), 932–937. 10.1038/s41587-021-01179-w. PubMed DOI
Nallapareddy V.; Bordin N.; Sillitoe I.; Heinzinger M.; Littmann M.; Waman V. P.; Sen N.; Rost B.; Orengo C. CATHe: Detection of Remote Homologues for CATH Superfamilies Using Embeddings from Protein Language Models. Bioinformatics 2023, 39, btad02910.1093/bioinformatics/btad029. PubMed DOI PMC
Jiang S.-Y.; Jin J.; Sarojam R.; Ramachandran S. A Comprehensive Survey on the Terpene Synthase Gene Family Provides New Insight into Its Evolutionary Patterns. Genome Biol. Evol. 2019, 11 (8), 2078–2098. 10.1093/gbe/evz142. PubMed DOI PMC
Claudel-Renard C.; Chevalet C.; Faraut T.; Kahn D. Enzyme-Specific Profiles for Genome Annotation: PRIAM. Nucleic Acids Res. 2003, 31 (22), 6633–6639. 10.1093/nar/gkg847. PubMed DOI PMC
Shen H.-B.; Chou K.-C. EzyPred: A Top–down Approach for Predicting Enzyme Functional Classes and Subclasses. Biochem. Biophys. Res. Commun. 2007, 364 (1), 53–59. 10.1016/j.bbrc.2007.09.098. PubMed DOI
Dalkiran A.; Rifaioglu A. S.; Martin M. J.; Cetin-Atalay R.; Atalay V.; Doğan T. ECPred: A Tool for the Prediction of the Enzymatic Functions of Protein Sequences Based on the EC Nomenclature. BMC Bioinformatics 2018, 19, 334.10.1186/s12859-018-2368-y. PubMed DOI PMC
Huang W.-L.; Chen H.-M.; Hwang S.-F.; Ho S.-Y. Accurate Prediction of Enzyme Subfamily Class Using an Adaptive Fuzzy K-Nearest Neighbor Method. Biosystems. 2007, 90 (2), 405–413. 10.1016/j.biosystems.2006.10.004. PubMed DOI
Nasibov E.; Kandemir-Cavas C. Efficiency Analysis of KNN and Minimum Distance-Based Classifiers in Enzyme Family Prediction. Comput. Biol. Chem. 2009, 33 (6), 461–464. 10.1016/j.compbiolchem.2009.09.002. PubMed DOI
De Ferrari L.; Aitken S.; van Hemert J.; Goryanin I. EnzML: Multi-Label Prediction of Enzyme Classes Using InterPro Signatures. BMC Bioinformatics 2012, 13, 61.10.1186/1471-2105-13-61. PubMed DOI PMC
Dobson P. D.; Doig A. J. Predicting Enzyme Class from Protein Structure without Alignments. J. Mol. Biol. 2005, 345 (1), 187–199. 10.1016/j.jmb.2004.10.024. PubMed DOI
Kumar N.; Skolnick J. EFICAz2.5: Application of a High-Precision Enzyme Function Predictor to 396 Proteomes. Bioinformatics 2012, 28 (20), 2687–2688. 10.1093/bioinformatics/bts510. PubMed DOI PMC
Matsuta Y.; Ito M.; Tohsato Y. ECOH: An Enzyme Commission Number Predictor Using Mutual Information and a Support Vector Machine. Bioinformatics 2013, 29 (3), 365–372. 10.1093/bioinformatics/bts700. PubMed DOI
Li Y. H.; Xu J. Y.; Tao L.; Li X. F.; Li S.; Zeng X.; Chen S. Y.; Zhang P.; Qin C.; Zhang C.; Chen Z.; Zhu F.; Chen Y. Z. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity. PLoS One 2016, 11 (8), e015529010.1371/journal.pone.0155290. PubMed DOI PMC
Nagao C.; Nagano N.; Mizuguchi K. Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests. PLoS One 2014, 9 (1), e8462310.1371/journal.pone.0084623. PubMed DOI PMC
Kumar C.; Choudhary A. A Top-down Approach to Classify Enzyme Functional Classes and Sub-Classes Using Random Forest. EURASIP J. Bioinform. Syst. Biol. 2012, 2012, 1.10.1186/1687-4153-2012-1. PubMed DOI PMC
Volpato V.; Adelfio A.; Pollastri G. Accurate Prediction of Protein Enzymatic Class by N-to-1 Neural Networks. BMC Bioinformatics 2013, 14, S11.10.1186/1471-2105-14-S1-S11. PubMed DOI PMC
Amidi A.; Amidi S.; Vlachakis D.; Megalooikonomou V.; Paragios N.; Zacharaki E. I. EnzyNet: Enzyme Classification Using 3D Convolutional Neural Networks on Spatial Representation. PeerJ 2018, 6, e475010.7717/peerj.4750. PubMed DOI PMC
Ryu J. Y.; Kim H. U.; Lee S. Y. Deep Learning Enables High-Quality and High-Throughput Prediction of Enzyme Commission Numbers. Proc. Natl. Acad. Sci. U. S. A. 2019, 116 (28), 13996–14001. 10.1073/pnas.1821905116. PubMed DOI PMC
Sanderson T.; Bileschi M. L.; Belanger D.; Colwell L. J. ProteInfer, Deep Neural Networks for Protein Functional Inference. Elife 2023, 12, e8094210.7554/eLife.80942. PubMed DOI PMC
Yu T.; Cui H.; Li J. C.; Luo Y.; Jiang G.; Zhao H. Enzyme Function Prediction Using Contrastive Learning. Science 2023, 379 (6639), 1358–1363. 10.1126/science.adf2465. PubMed DOI
Levin I.; Liu M.; Voigt C. A.; Coley C. W. Merging Enzymatic and Synthetic Chemistry with Computational Synthesis Planning. Nat. Commun. 2022, 13, 7747.10.1038/s41467-022-35422-y. PubMed DOI PMC
Zheng S.; Zeng T.; Li C.; Chen B.; Coley C. W.; Yang Y.; Wu R. Deep Learning Driven Biosynthetic Pathways Navigation for Natural Products with BioNavi-NP. Nat. Commun. 2022, 13, 3342.10.1038/s41467-022-30970-9. PubMed DOI PMC
Watanabe N.; Yamamoto M.; Murata M.; Vavricka C. J.; Ogino C.; Kondo A.; Araki M. Comprehensive Machine Learning Prediction of Extensive Enzymatic Reactions. J. Phys. Chem. B 2022, 126 (36), 6762–6770. 10.1021/acs.jpcb.2c03287. PubMed DOI
Kroll A.; Ranjan S.; Engqvist M. K. M.; Lercher M. J. A General Model to Predict Small Molecule Substrates of Enzymes Based on Machine and Deep Learning. Nat. Commun. 2023, 14, 2787.10.1038/s41467-023-38347-2. PubMed DOI PMC
Goldman S.; Das R.; Yang K. K.; Coley C. W. Machine Learning Modeling of Family Wide Enzyme-Substrate Specificity Screens. PLoS Comput. Biol. 2022, 18 (2), e100985310.1371/journal.pcbi.1009853. PubMed DOI PMC
Berman H. M.; Gabanyi M. J.; Kouranov A.; Micallef D. I.; Westbrook J.; Protein Structure Initiative network of investigators . Protein Structure Initiative - TargetTrack 2000-2017 - all data files. Zenodo, 2017. 10.5281/zenodo.821654 DOI
Jarzab A.; Kurzawa N.; Hopf T.; Moerch M.; Zecha J.; Leijten N.; Bian Y.; Musiol E.; Maschberger M.; Stoehr G.; Becher I.; Daly C.; Samaras P.; Mergner J.; Spanier B.; Angelov A.; Werner T.; Bantscheff M.; Wilhelm M.; Klingenspor M.; Lemeer S.; Liebl W.; Hahne H.; Savitski M. M.; Kuster B. Meltome Atlas-Thermal Proteome Stability across the Tree of Life. Nat. Methods 2020, 17 (5), 495–503. 10.1038/s41592-020-0801-4. PubMed DOI
Yang Y.; Zhao J.; Zeng L.; Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int. J. Mol. Sci. 2022, 23 (18), 10798.10.3390/ijms231810798. PubMed DOI PMC
Sapoval N.; Aghazadeh A.; Nute M. G.; Antunes D. A.; Balaji A.; Baraniuk R.; Barberan C. J.; Dannenfelser R.; Dun C.; Edrisi M.; Elworth R. A. L.; Kille B.; Kyrillidis A.; Nakhleh L.; Wolfe C. R.; Yan Z.; Yao V.; Treangen T. J. Current Progress and Open Challenges for Applying Deep Learning across the Biosciences. Nat. Commun. 2022, 13, 1728.10.1038/s41467-022-29268-7. PubMed DOI PMC
Diaz D. J.; Kulikova A. V.; Ellington A. D.; Wilke C. O. Using Machine Learning to Predict the Effects and Consequences of Mutations in Proteins. Curr. Opin. Struct. Biol. 2023, 78, 10251810.1016/j.sbi.2022.102518. PubMed DOI PMC
Thumuluri V.; Martiny H.-M.; Almagro Armenteros J. J.; Salomon J.; Nielsen H.; Johansen A. R. NetSolP: Predicting Protein Solubility in Escherichia Coli Using Language Models. Bioinformatics 2022, 38 (4), 941–946. 10.1093/bioinformatics/btab801. PubMed DOI
Caldararu O.; Mehra R.; Blundell T. L.; Kepp K. P. Systematic Investigation of the Data Set Dependency of Protein Stability Predictors. J. Chem. Inf. Model. 2020, 60 (10), 4772–4784. 10.1021/acs.jcim.0c00591. PubMed DOI
Mazurenko S. Predicting Protein Stability and Solubility Changes upon Mutations: Data Perspective. ChemCatChem 2020, 12 (22), 5590–5598. 10.1002/cctc.202000933. DOI
Velecký J.; Hamsikova M.; Stourac J.; Musil M.; Damborsky J.; Bednar D.; Mazurenko S. SoluProtMutDB: A Manually Curated Database of Protein Solubility Changes upon Mutations. Comput. Struct. Biotechnol. J. 2022, 20, 6339–6347. 10.1016/j.csbj.2022.11.009. PubMed DOI PMC
Wang S.; Tang H.; Zhao Y.; Zuo L. BayeStab: Predicting Effects of Mutations on Protein Stability with Uncertainty Quantification. Protein Sci. 2022, 31 (11), e446710.1002/pro.4467. PubMed DOI PMC
Nikam R.; Kulandaisamy A.; Harini K.; Sharma D.; Gromiha M. M. ProThermDB: Thermodynamic Database for Proteins and Mutants Revisited after 15 Years. Nucleic Acids Res. 2021, 49 (D1), D420–D424. 10.1093/nar/gkaa1035. PubMed DOI PMC
Iqbal S.; Ge F.; Li F.; Akutsu T.; Zheng Y.; Gasser R. B.; Yu D.-J.; Webb G. I.; Song J. PROST: AlphaFold2-Aware Sequence-Based Predictor to Estimate Protein Stability Changes upon Missense Mutations. J. Chem. Inf. Model. 2022, 62 (17), 4270–4282. 10.1021/acs.jcim.2c00799. PubMed DOI
Hernández I. M.; Dehouck Y.; Bastolla U.; López-Blanco J. R.; Chacón P. Predicting Protein Stability Changes upon Mutation Using a Simple Orientational Potential. Bioinformatics 2023, 39 (1), btad01110.1093/bioinformatics/btad011. PubMed DOI PMC
Xavier J. S.; Nguyen T.-B.; Karmarkar M.; Portelli S.; Rezende P. M.; Velloso J. P. L.; Ascher D. B.; Pires D. E. V. ThermoMutDB: A Thermodynamic Database for Missense Mutations. Nucleic Acids Res. 2021, 49 (D1), D475–D479. 10.1093/nar/gkaa925. PubMed DOI PMC
Pak M. A.; Markhieva K. A.; Novikova M. S.; Petrov D. S.; Vorobyev I. S.; Maksimova E. S.; Kondrashov F. A.; Ivankov D. N. Using AlphaFold to Predict the Impact of Single Mutations on Protein Stability and Function. PLoS One 2023, 18 (3), e028268910.1371/journal.pone.0282689. PubMed DOI PMC
Tsuboyama K.; Dauparas J.; Chen J.; Laine E.; Mohseni Behbahani Y.; Weinstein J. J.; Mangan N. M.; Ovchinnikov S.; Rocklin G. J. Mega-Scale Experimental Analysis of Protein Folding Stability in Biology and Design. Nature 2023, 620 (7973), 434–444. 10.1038/s41586-023-06328-6. PubMed DOI PMC
Yang Y.; Zeng L.; Vihinen M. PON-Sol2: Prediction of Effects of Variants on Protein Solubility. Int. J. Mol. Sci. 2021, 22 (15), 8027.10.3390/ijms22158027. PubMed DOI PMC
Li F.; Yuan L.; Lu H.; Li G.; Chen Y.; Engqvist M. K. M.; Kerkhoven E. J.; Nielsen J. Deep Learning-Based Kcat Prediction Enables Improved Enzyme-Constrained Model Reconstruction. Nature Catalysis 2022, 5 (8), 662–672. 10.1038/s41929-022-00798-z. DOI
Xie W. J.; Asadi M.; Warshel A. Enhancing Computational Enzyme Design by a Maximum Entropy Strategy. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (7), e212235511910.1073/pnas.2122355119. PubMed DOI PMC
Ostafe R.; Fontaine N.; Frank D.; Ng Fuk Chong M.; Prodanovic R.; Pandjaitan R.; Offmann B.; Cadet F.; Fischer R. One-Shot Optimization of Multiple Enzyme Parameters: Tailoring Glucose Oxidase for PH and Electron Mediators. Biotechnol. Bioeng. 2020, 117 (1), 17–29. 10.1002/bit.27169. PubMed DOI
Høie M. H.; Cagiada M.; Beck Frederiksen A. H.; Stein A.; Lindorff-Larsen K. Predicting and Interpreting Large-Scale Mutagenesis Data Using Analyses of Protein Stability and Conservation. Cell Rep. 2022, 38 (2), 11020710.1016/j.celrep.2021.110207. PubMed DOI
Cendrowska J. PRISM: An Algorithm for Inducing Modular Rules. Int. J. Man. Mach. Stud. 1987, 27 (4), 349–370. 10.1016/S0020-7373(87)80003-2. DOI
Gupta A.; Agrawal S. Machine Learning-Based Enzyme Engineering of PETase for Improved Efficiency in Degrading Non-Biodegradable Plastic. bioRxiv 2022, 10.1101/2022.01.11.475766. DOI
Gado J. E.; Beckham G. T.; Payne C. M. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning. J. Chem. Inf. Model. 2020, 60 (8), 4098–4107. 10.1021/acs.jcim.0c00489. PubMed DOI
Voutilainen S.; Heinonen M.; Andberg M.; Jokinen E.; Maaheimo H.; Pääkkönen J.; Hakulinen N.; Rouvinen J.; Lähdesmäki H.; Kaski S.; Rousu J.; Penttilä M.; Koivula A. Substrate Specificity of 2-Deoxy-D-Ribose 5-Phosphate Aldolase (DERA) Assessed by Different Protein Engineering and Machine Learning Methods. Appl. Microbiol. Biotechnol. 2020, 104 (24), 10515–10529. 10.1007/s00253-020-10960-x. PubMed DOI PMC
Prabakaran R.; Rawat P.; Kumar S.; Michael Gromiha M. ANuPP: A Versatile Tool to Predict Aggregation Nucleating Regions in Peptides and Proteins. J. Mol. Biol. 2021, 433 (11), 16670710.1016/j.jmb.2020.11.006. PubMed DOI
Thangakani A. M.; Nagarajan R.; Kumar S.; Sakthivel R.; Velmurugan D.; Gromiha M. M. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation. PLoS One 2016, 11 (4), e015294910.1371/journal.pone.0152949. PubMed DOI PMC
Rawat P.; Prabakaran R.; Sakthivel R.; Mary Thangakani A.; Kumar S.; Gromiha M. M. CPAD 2.0: A Repository of Curated Experimental Data on Aggregating Proteins and Peptides. Amyloid 2020, 27 (2), 128–133. 10.1080/13506129.2020.1715363. PubMed DOI
Beerten J.; Van Durme J.; Gallardo R.; Capriotti E.; Serpell L.; Rousseau F.; Schymkowitz J. WALTZ-DB: A Benchmark Database of Amyloidogenic Hexapeptides. Bioinformatics 2015, 31 (10), 1698–1700. 10.1093/bioinformatics/btv027. PubMed DOI
Louros N.; Konstantoulea K.; De Vleeschouwer M.; Ramakers M.; Schymkowitz J.; Rousseau F. WALTZ-DB 2.0: An Updated Database Containing Structural Information of Experimentally Determined Amyloid-Forming Peptides. Nucleic Acids Res. 2020, 48 (D1), D389–D393. 10.1093/nar/gkz758. PubMed DOI PMC
Wozniak P. P.; Kotulska M. AmyLoad: Website Dedicated to Amyloidogenic Protein Fragments. Bioinformatics 2015, 31 (20), 3395–3397. 10.1093/bioinformatics/btv375. PubMed DOI
Liu X.; Luo Y.; Li P.; Song S.; Peng J. Deep Geometric Representations for Modeling Effects of Mutations on Protein-Protein Binding Affinity. PLoS Comput. Biol. 2021, 17 (8), e100928410.1371/journal.pcbi.1009284. PubMed DOI PMC
Jankauskaite J.; Jiménez-García B.; Dapkunas J.; Fernández-Recio J.; Moal I. H. SKEMPI 2.0: An Updated Benchmark of Changes in Protein-Protein Binding Energy, Kinetics and Thermodynamics upon Mutation. Bioinformatics 2019, 35 (3), 462–469. 10.1093/bioinformatics/bty635. PubMed DOI PMC
Stourac J.; Dubrava J.; Musil M.; Horackova J.; Damborsky J.; Mazurenko S.; Bednar D. FireProtDB: Database of Manually Curated Protein Stability Data. Nucleic Acids Res. 2021, 49 (D1), D319–D324. 10.1093/nar/gkaa981. PubMed DOI PMC
Pancotti C.; Benevenuta S.; Birolo G.; Alberini V.; Repetto V.; Sanavia T.; Capriotti E.; Fariselli P. Predicting Protein Stability Changes upon Single-Point Mutation: A Thorough Comparison of the Available Tools on a New Dataset. Brief. Bioinform. 2022, 23 (2), bbab55510.1093/bib/bbab555. PubMed DOI PMC
Livesey B. J.; Marsh J. A. Updated Benchmarking of Variant Effect Predictors Using Deep Mutational Scanning. bioRxiv 2022, 10.1101/2022.11.19.517196. PubMed DOI PMC
Dunham A. S.; Beltrao P. Exploring Amino Acid Functions in a Deep Mutational Landscape. Mol. Syst. Biol. 2021, 17 (7), e1030510.15252/msb.202110305. PubMed DOI PMC
Reeb J.; Wirth T.; Rost B. Variant Effect Predictions Capture Some Aspects of Deep Mutational Scanning Experiments. BMC Bioinformatics 2020, 21, 107.10.1186/s12859-020-3439-4. PubMed DOI PMC
Gray V. E.; Hause R. J.; Luebeck J.; Shendure J.; Fowler D. M. Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. Cell Syst 2018, 6 (1), 116–124.e3. 10.1016/j.cels.2017.11.003. PubMed DOI PMC
Notin P.; Dias M.; Frazer J.; Hurtado J. M.; Gomez A. N.; Marks D.; Gal Y.. Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-Time Retrieval. In Proceedings of the 39th International Conference on Machine Learning; Chaudhuri K., Jegelka S., Song L., Szepesvari C., Niu G., Sabato S., Eds.; Proceedings of Machine Learning Research, Vol. 162; PMLR, 2022; pp 16990–17017.
Markin C. J.; Mokhtari D. A.; Sunden F.; Appel M. J.; Akiva E.; Longwell S. A.; Sabatti C.; Herschlag D.; Fordyce P. M. Revealing Enzyme Functional Architecture via High-Throughput Microfluidic Enzyme Kinetics. Science 2021, 373 (6553), eabf8761.10.1126/science.abf8761. PubMed DOI PMC
Thompson S.; Zhang Y.; Ingle C.; Reynolds K. A.; Kortemme T. Altered Expression of a Quality Control Protease in E. Coli Reshapes the in Vivo Mutational Landscape of a Model Enzyme. Elife 2020, 9, e5347610.7554/eLife.53476. PubMed DOI PMC
Nikoomanzar A.; Vallejo D.; Chaput J. C. Elucidating the Determinants of Polymerase Specificity by Microfluidic-Based Deep Mutational Scanning. ACS Synth. Biol. 2019, 8 (6), 1421–1429. 10.1021/acssynbio.9b00104. PubMed DOI
Mighell T. L.; Thacker S.; Fombonne E.; Eng C.; O’Roak B. J. An Integrated Deep-Mutational-Scanning Approach Provides Clinical Insights on PTEN Genotype-Phenotype Relationships. Am. J. Hum. Genet. 2020, 106 (6), 818–829. 10.1016/j.ajhg.2020.04.014. PubMed DOI PMC
Wang X.; Zhang X.; Peng C.; Shi Y.; Li H.; Xu Z.; Zhu W. D3DistalMutation: A Database to Explore the Effect of Distal Mutations on Enzyme Activity. J. Chem. Inf. Model. 2021, 61 (5), 2499–2508. 10.1021/acs.jcim.1c00318. PubMed DOI
Ma E. J.; Siirola E.; Moore C.; Kummer A.; Stoeckli M.; Faller M.; Bouquet C.; Eggimann F.; Ligibel M.; Huynh D.; Cutler G.; Siegrist L.; Lewis R. A.; Acker A.-C.; Freund E.; Koch E.; Vogel M.; Schlingensiepen H.; Oakeley E. J.; Snajdrova R. Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity. ACS Catal. 2021, 11 (20), 12433–12445. 10.1021/acscatal.1c02786. DOI
Wu Z.; Kan S. B. J.; Lewis R. D.; Wittmann B. J.; Arnold F. H. Machine Learning-Assisted Directed Protein Evolution with Combinatorial Libraries. Proc. Natl. Acad. Sci. U. S. A. 2019, 116 (18), 8852–8858. 10.1073/pnas.1901979116. PubMed DOI PMC
Li G.; Qin Y.; Fontaine N. T.; Ng Fuk Chong M.; Maria-Solano M. A.; Feixas F.; Cadet X. F.; Pandjaitan R.; Garcia-Borràs M.; Cadet F.; Reetz M. T. Machine Learning Enables Selection of Epistatic Enzyme Mutants for Stability Against Unfolding and Detrimental Aggregation. Chembiochem 2021, 22 (5), 904–914. 10.1002/cbic.202000612. PubMed DOI PMC
Sarkar A.; Yang Y.; Vihinen M. Variation Benchmark Datasets: Update, Criteria, Quality and Applications. Database 2020, 2020, baz11710.1093/database/baz117. PubMed DOI PMC
Miton C. M.; Tokuriki N. How Mutational Epistasis Impairs Predictability in Protein Evolution and Design. Protein Sci. 2016, 25 (7), 1260–1272. 10.1002/pro.2876. PubMed DOI PMC
Wittmund M.; Cadet F.; Davari M. D. Learning Epistasis and Residue Coevolution Patterns: Current Trends and Future Perspectives for Advancing Enzyme Engineering. ACS Catal. 2022, 12 (22), 14243–14263. 10.1021/acscatal.2c01426. DOI
Yu H.; Ma S.; Li Y.; Dalby P. A. Hot Spots-Making Directed Evolution Easier. Biotechnol. Adv. 2022, 56, 10792610.1016/j.biotechadv.2022.107926. PubMed DOI
Sumbalova L.; Stourac J.; Martinek T.; Bednar D.; Damborsky J. HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries Based on Sequence Input Information. Nucleic Acids Res. 2018, 46 (W1), W356–W362. 10.1093/nar/gky417. PubMed DOI PMC
Khersonsky O.; Lipsh R.; Avizemer Z.; Ashani Y.; Goldsmith M.; Leader H.; Dym O.; Rogotner S.; Trudeau D. L.; Prilusky J.; Amengual-Rigo P.; Guallar V.; Tawfik D. S.; Fleishman S. J. Automated Design of Efficient and Functionally Diverse Enzyme Repertoires. Mol. Cell 2018, 72 (1), 178–186.e5. 10.1016/j.molcel.2018.08.033. PubMed DOI PMC
Clifton B. E.; Kozome D.; Laurino P. Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design. Biochemistry 2023, 62 (2), 210–220. 10.1021/acs.biochem.1c00757. PubMed DOI
Hie B. L.; Shanker V. R.; Xu D.; Bruun T. U. J.; Weidenbacher P. A.; Tang S.; Wu W.; Pak J. E.; Kim P. S. Efficient Evolution of Human Antibodies from General Protein Language Models. Nat. Biotechnol. 2023, 10.1038/s41587-023-01763-2. PubMed DOI PMC
Goudy O. J.; Nallathambi A.; Kinjo T.; Randolph N.; Kuhlman B. In Silico Evolution of Protein Binders with Deep Learning Models for Structure Prediction and Sequence Design. bioRxiv 2023, 10.1101/2023.05.03.539278. DOI
Linder J.; Bogard N.; Rosenberg A. B.; Seelig G. A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences. Cell Syst 2020, 11 (1), 49–62.e16. 10.1016/j.cels.2020.05.007. PubMed DOI PMC
Szegedy C.; Zaremba W.; Sutskever I.; Bruna J.; Erhan D.; Goodfellow I.; Fergus R. Intriguing Properties of Neural Networks. arXiv [cs.CV] 2013, 10.48550/arXiv.1312.6199. DOI
Yu T.; Boob A. G.; Singh N.; Su Y.; Zhao H. In Vitro Continuous Protein Evolution Empowered by Machine Learning and Automation. Cell Syst 2023, 14, 633.10.1016/j.cels.2023.04.006. PubMed DOI
Yang K. K.; Wu Z.; Arnold F. H. Machine-Learning-Guided Directed Evolution for Protein Engineering. Nat. Methods 2019, 16 (8), 687–694. 10.1038/s41592-019-0496-6. PubMed DOI
Wittmann B. J.; Yue Y.; Arnold F. H. Informed Training Set Design Enables Efficient Machine Learning-Assisted Directed Protein Evolution. Cell Syst 2021, 12 (11), 1026–1045.e7. 10.1016/j.cels.2021.07.008. PubMed DOI
Hie B.; Bryson B. D.; Berger B. Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design. Cell Syst 2020, 11 (5), 461–477.e9. 10.1016/j.cels.2020.09.007. PubMed DOI
Jain M.; Deleu T.; Hartford J.; Liu C.-H.; Hernandez-Garcia A.; Bengio Y. GFlowNets for AI-Driven Scientific Discovery. arXiv [cs.LG] 2023, 10.48550/arXiv.2302.00615. DOI
Bengio E.; Jain M.; Korablyov M.; Precup D.; Bengio Y. Flow Network Based Generative Models for Non-Iterative Diverse Candidate Generation. Adv. Neural Inf. Process. Syst. 2021, 34, 27381–27394.
Qiu Y.; Wei G.-W. CLADE 2.0: Evolution-Driven Cluster Learning-Assisted Directed Evolution. J. Chem. Inf. Model. 2022, 62 (19), 4629–4641. 10.1021/acs.jcim.2c01046. PubMed DOI
Alley E. C.; Khimulya G.; Biswas S.; AlQuraishi M.; Church G. M. Unified Rational Protein Engineering with Sequence-Based Deep Representation Learning. Nat. Methods 2019, 16 (12), 1315–1322. 10.1038/s41592-019-0598-1. PubMed DOI PMC
Biswas S.; Khimulya G.; Alley E. C.; Esvelt K. M.; Church G. M. Low-N Protein Engineering with Data-Efficient Deep Learning. Nat. Methods 2021, 18 (4), 389–396. 10.1038/s41592-021-01100-y. PubMed DOI
Hsu C.; Nisonoff H.; Fannjiang C.; Listgarten J. Learning Protein Fitness Models from Evolutionary and Assay-Labeled Data. Nat. Biotechnol. 2022, 40 (7), 1114–1122. 10.1038/s41587-021-01146-5. PubMed DOI
Zheng Z.; Deng Y.; Xue D.; Zhou Y.; Fei Y. E.; Gu Q. Structure-Informed Language Models Are Protein Designers. arXiv [cs.LG] 2023, 10.48550/arXiv.2302.01649. DOI
Radford A.; Wu J.; Child R.; Luan D.; Amodei D.; Sutskever I.. Language Models are Unsupervised Multitask Learners. Life-extension, 2020. https://life-extension.github.io/2020/05/27/GPT%E6%8A%80%E6%9C%AF%E5%88%9D%E6%8E%A2/language-models.pdf (accessed 2023-06-08).
Harris Z. S. Distributional Structure. Word World 1954, 10 (2–3), 146–162. 10.1080/00437956.1954.11659520. DOI
Elnaggar A.; Heinzinger M.; Dallago C.; Rehawi G.; Wang Y.; Jones L.; Gibbs T.; Feher T.; Angerer C.; Steinegger M.; Bhowmik D.; Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44 (10), 7112–7127. 10.1109/TPAMI.2021.3095381. PubMed DOI
Clifford J. N.; Høie M. H.; Deleuran S.; Peters B.; Nielsen M.; Marcatili P. BepiPred-3.0: Improved B-Cell Epitope Prediction Using Protein Language Models. Protein Sci. 2022, 31 (12), e449710.1002/pro.4497. PubMed DOI PMC
Elnaggar A.; Essam H.; Salah-Eldin W.; Moustafa W.; Elkerdawy M.; Rochereau C.; Rost B. Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling. bioRxiv 2023, 10.1101/2023.01.16.524265. DOI
Pokharel S.; Pratyush P.; Heinzinger M.; Newman R. H.; Kc D. B. Improving Protein Succinylation Sites Prediction Using Embeddings from Protein Language Model. Sci. Rep. 2022, 12, 16933.10.1038/s41598-022-21366-2. PubMed DOI PMC
Houlsby N.; Giurgiu A.; Jastrzebski S.; Morrone B.; De Laroussilhe Q.; Gesmundo A.; Attariyan M.; Gelly S.. Parameter-Efficient Transfer Learning for NLP. In Proceedings of the 36th International Conference on Machine Learning; Chaudhuri K., Salakhutdinov R., Eds.; Proceedings of Machine Learning Research, Vol. 97; PMLR, 2019; pp 2790–2799.
Yang W.; Liu C.; Li Z.. Lightweight Fine-Tuning a Pretrained Protein Language Model for Protein Secondary Structure Prediction. bioRxiv (Bioengineering), March 23, 2023, 2023.03.22.530066, ver. 1. 10.1101/2023.03.22.530066. DOI
Suzek B. E.; Wang Y.; Huang H.; McGarvey P. B.; Wu C. H. UniProt Consortium. UniRef Clusters: A Comprehensive and Scalable Alternative for Improving Sequence Similarity Searches. Bioinformatics 2015, 31 (6), 926–932. 10.1093/bioinformatics/btu739. PubMed DOI PMC
Nijkamp E.; Ruffolo J.; Weinstein E. N.; Naik N.; Madani A. ProGen2: Exploring the Boundaries of Protein Language Models. arXiv [cs.LG] 2022, 10.48550/arXiv.2206.13517. PubMed DOI
Finn R. D.; Bateman A.; Clements J.; Coggill P.; Eberhardt R. Y.; Eddy S. R.; Heger A.; Hetherington K.; Holm L.; Mistry J.; Sonnhammer E. L. L.; Tate J.; Punta M. Pfam: The Protein Families Database. Nucleic Acids Res. 2014, 42, D222–D230. 10.1093/nar/gkt1223. PubMed DOI PMC
Joosten R. P.; Salzemann J.; Bloch V.; Stockinger H.; Berglund A.-C.; Blanchet C.; Bongcam-Rudloff E.; Combet C.; Da Costa A. L.; Deleage G.; Diarena M.; Fabbretti R.; Fettahi G.; Flegel V.; Gisel A.; Kasam V.; Kervinen T.; Korpelainen E.; Mattila K.; Pagni M.; Reichstadt M.; Breton V.; Tickle I. J.; Vriend G. PDB_REDO: Automated Re-Refinement of X-Ray Structure Models in the PDB. J. Appl. Crystallogr. 2009, 42, 376–384. 10.1107/S0021889809008784. PubMed DOI PMC
Dauparas J.; Anishchenko I.; Bennett N.; Bai H.; Ragotte R. J.; Milles L. F.; Wicky B. I. M.; Courbet A.; de Haas R. J.; Bethel N.; Leung P. J. Y.; Huddy T. F.; Pellock S.; Tischer D.; Chan F.; Koepnick B.; Nguyen H.; Kang A.; Sankaran B.; Bera A. K.; King N. P.; Baker D. Robust Deep Learning–Based Protein Sequence Design Using ProteinMPNN. Science 2022, 378 (6615), 49–56. 10.1126/science.add2187. PubMed DOI PMC
Sillitoe I.; Lewis T. E.; Cuff A.; Das S.; Ashford P.; Dawson N. L.; Furnham N.; Laskowski R. A.; Lee D.; Lees J. G.; Lehtinen S.; Studer R. A.; Thornton J.; Orengo C. A. CATH: Comprehensive Structural and Functional Annotations for Genome Sequences. Nucleic Acids Res. 2015, 43, D376–D381. 10.1093/nar/gku947. PubMed DOI PMC
Varadi M.; Anyango S.; Deshpande M.; Nair S.; Natassia C.; Yordanova G.; Yuan D.; Stroe O.; Wood G.; Laydon A.; Žídek A.; Green T.; Tunyasuvunakool K.; Petersen S.; Jumper J.; Clancy E.; Green R.; Vora A.; Lutfi M.; Figurnov M.; Cowie A.; Hobbs N.; Kohli P.; Kleywegt G.; Birney E.; Hassabis D.; Velankar S. AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models. Nucleic Acids Res. 2022, 50 (D1), D439–D444. 10.1093/nar/gkab1061. PubMed DOI PMC
Rao R. M.; Liu J.; Verkuil R.; Meier J.; Canny J.; Abbeel P.; Sercu T.; Rives A.. MSA Transformer. In Proceedings of the 38th International Conference on Machine Learning; Meila M., Zhang T., Eds.; Proceedings of Machine Learning Research, Vol. 139; PMLR, 2021; pp 8844–8856.
Ho J.; Kalchbrenner N.; Weissenborn D.; Salimans T. Axial Attention in Multidimensional Transformers. arXiv [cs.CV] 2019, 10.48550/arXiv.1912.12180. DOI
Repecka D.; Jauniskis V.; Karpus L.; Rembeza E.; Rokaitis I.; Zrimec J.; Poviloniene S.; Laurynenas A.; Viknander S.; Abuajwa W.; Savolainen O.; Meskys R.; Engqvist M. K. M.; Zelezniak A. Expanding Functional Protein Sequence Spaces Using Generative Adversarial Networks. Nature Machine Intelligence 2021, 3 (4), 324–333. 10.1038/s42256-021-00310-5. DOI
Sevgen E.; Moller J.; Lange A.; Parker J.; Quigley S.; Mayer J.; Srivastava P.; Gayatri S.; Hosfield D.; Korshunova M.; Livne M.; Gill M.; Ranganathan R.; Costa A. B.; Ferguson A. L.. ProT-VAE: Protein Transformer Variational AutoEncoder for Functional Protein Design. bioRxiv (Synthetic Biology), January 24, 2023, 2023.01.23.525232, ver. 1. 10.1101/2023.01.23.525232. DOI
Luo Y.; Jiang G.; Yu T.; Liu Y.; Vo L.; Ding H.; Su Y.; Qian W. W.; Zhao H.; Peng J. ECNet Is an Evolutionary Context-Integrated Deep Learning Framework for Protein Engineering. Nat. Commun. 2021, 12, 5743.10.1038/s41467-021-25976-8. PubMed DOI PMC
Hochreiter S.; Schmidhuber J. Long Short-Term Memory. Neural Comput. 1997, 9 (8), 1735–1780. 10.1162/neco.1997.9.8.1735. PubMed DOI
Baek M.; DiMaio F.; Anishchenko I.; Dauparas J.; Ovchinnikov S.; Lee G. R.; Wang J.; Cong Q.; Kinch L. N.; Schaeffer R. D.; Millán C.; Park H.; Adams C.; Glassman C. R.; DeGiovanni A.; Pereira J. H.; Rodrigues A. V.; van Dijk A. A.; Ebrecht A. C.; Opperman D. J.; Sagmeister T.; Buhlheller C.; Pavkov-Keller T.; Rathinaswamy M. K.; Dalwadi U.; Yip C. K.; Burke J. E.; Garcia K. C.; Grishin N. V.; Adams P. D.; Read R. J.; Baker D. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373 (6557), 871–876. 10.1126/science.abj8754. PubMed DOI PMC
Illig A.-M.; Siedhoff N. E.; Schwaneberg U.; Davari M. D. A Hybrid Model Combining Evolutionary Probability and Machine Learning Leverages Data-Driven Protein Engineering. bioRxiv 2022, 10.1101/2022.06.07.495081. DOI
Ding X.; Zou Z.; Brooks C. L. Iii Deciphering Protein Evolution and Fitness Landscapes with Latent Space Models. Nat. Commun. 2019, 10, 5644.10.1038/s41467-019-13633-0. PubMed DOI PMC
Kohout P.; Vasina M.; Majerova M.; Novakova V.; Damborsky J.; Bednar D.; et al.Design of Enzymes for Biocatalysis, Bioremediation and Biosensing Using Variational Autoencoder-Generated Latent Space. ChemRxiv. Cambridge: Cambridge Open Engage, 2023.10.26434/chemrxiv-2023-jcds7. DOI
Ziegler C.; Martin J.; Sinner C.; Morcos F. Latent Generative Landscapes as Maps of Functional Diversity in Protein Sequence Space. Nat. Commun. 2023, 14, 2222.10.1038/s41467-023-37958-z. PubMed DOI PMC
Moffat L.; Jones D. T. Increasing the Accuracy of Single Sequence Prediction Methods Using a Deep Semi-Supervised Learning Framework. Bioinformatics 2021, 37 (21), 3744–3751. 10.1093/bioinformatics/btab491. PubMed DOI PMC
Bepler T.; Berger B.. Learning Protein Sequence Embeddings Using Information from Structure. International Conference on Learning Representations, New Orleans, LA, May 6–9, 2019; OpenReview, 2019. https://openreview.net/forum?id=SygLehCqtm
Rao R.; Bhattacharya N.; Thomas N.; Duan Y.; Chen X.; Canny J.; Abbeel P.; Song Y. S. Evaluating Protein Transfer Learning with TAPE. Adv. Neural Inf. Process. Syst. 2019, 32, 9689–9701. PubMed PMC
Crean R. M.; Gardner J. M.; Kamerlin S. C. L. Harnessing Conformational Plasticity to Generate Designer Enzymes. J. Am. Chem. Soc. 2020, 142 (26), 11324–11342. 10.1021/jacs.0c04924. PubMed DOI PMC
Guo H.-B.; Perminov A.; Bekele S.; Kedziora G.; Farajollahi S.; Varaljay V.; Hinkle K.; Molinero V.; Meister K.; Hung C.; Dennis P.; Kelley-Loughnane N.; Berry R. AlphaFold2 Models Indicate That Protein Sequence Determines Both Structure and Dynamics. Sci. Rep. 2022, 12, 10696.10.1038/s41598-022-14382-9. PubMed DOI PMC
Faidon Brotzakis Z.; Zhang S.; Vendruscolo M.. AlphaFold Prediction of Structural Ensembles of Disordered Proteins. bioRxiv (Biophysics), January 19, 2023, 2023.01.19.524720, ver. 1. 10.1101/2023.01.19.524720. DOI
Piana S.; Laio A. Advillin Folding Takes Place on a Hypersurface of Small Dimensionality. Phys. Rev. Lett. 2008, 101 (20), 20810110.1103/PhysRevLett.101.208101. PubMed DOI
Glielmo A.; Husic B. E.; Rodriguez A.; Clementi C.; Noé F.; Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem. Rev. 2021, 121 (16), 9722–9758. 10.1021/acs.chemrev.0c01195. PubMed DOI PMC
Mardt A.; Pasquali L.; Wu H.; Noé F. VAMPnets for Deep Learning of Molecular Kinetics. Nat. Commun. 2018, 9, 5.10.1038/s41467-017-02388-1. PubMed DOI PMC
Marques S. M.; Kouba P.; Legrand A.; Sedlar J.; Disson L.; Planas-Iglesias J.; Sanusi Z.; Kunka A.; Damborsky J.; Pajdla T.; Prokop Z.; Mazurenko S.; Sivic J.; Bednar D.. Effects of Alzheimer’s Disease Drug Candidates on Disordered Aβ42 Dissected by Comparative Markov State Analysis (CoVAMPnet). bioRxiv (Biophysics), January 6, 2023, 2023.01.06.523007, ver. 1. 10.1101/2023.01.06.523007. DOI
Ward M. D.; Zimmerman M. I.; Meller A.; Chung M.; Swamidass S. J.; Bowman G. R. Deep Learning the Structural Determinants of Protein Biochemical Properties by Comparing Structural Ensembles with DiffNets. Nat. Commun. 2021, 12, 3023.10.1038/s41467-021-23246-1. PubMed DOI PMC
Akere A.; Chen S. H.; Liu X.; Chen Y.; Dantu S. C.; Pandini A.; Bhowmik D.; Haider S. Structure-Based Enzyme Engineering Improves Donor-Substrate Recognition of Arabidopsis Thaliana Glycosyltransferases. Biochem. J. 2020, 477 (15), 2791–2805. 10.1042/BCJ20200477. PubMed DOI PMC
Russ W. P.; Figliuzzi M.; Stocker C.; Barrat-Charlaix P.; Socolich M.; Kast P.; Hilvert D.; Monasson R.; Cocco S.; Weigt M.; Ranganathan R. An Evolution-Based Model for Designing Chorismate Mutase Enzymes. Science 2020, 369 (6502), 440–445. 10.1126/science.aba3304. PubMed DOI
Lu H.; Diaz D. J.; Czarnecki N. J.; Zhu C.; Kim W.; Shroff R.; Acosta D. J.; Alexander B. R.; Cole H. O.; Zhang Y.; Lynd N. A.; Ellington A. D.; Alper H. S. Machine Learning-Aided Engineering of Hydrolases for PET Depolymerization. Nature 2022, 604 (7907), 662–667. 10.1038/s41586-022-04599-z. PubMed DOI
Paik I.; Ngo P. H. T.; Shroff R.; Diaz D. J.; Maranhao A. C.; Walker D. J. F.; Bhadra S.; Ellington A. D. Improved Bst DNA Polymerase Variants Derived via a Machine Learning Approach. Biochemistry 2023, 62 (2), 410–418. 10.1021/acs.biochem.1c00451. PubMed DOI PMC
Weinstein J. J.; Goldenzweig A.; Hoch S.; Fleishman S. J. PROSS 2: A New Server for the Design of Stable and Highly Expressed Protein Variants. Bioinformatics 2021, 37 (1), 123–125. 10.1093/bioinformatics/btaa1071. PubMed DOI PMC
Musil M.; Stourac J.; Bendl J.; Brezovsky J.; Prokop Z.; Zendulka J.; Martinek T.; Bednar D.; Damborsky J. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic Acids Res. 2017, 45 (W1), W393–W399. 10.1093/nar/gkx285. PubMed DOI PMC
Kunka A.; Marques S.; Havlasek M.; Vasina M.; Velatova N.; Cengelova L.; Kovar D.; Damborsky J.; Marek M.; Bednar D.; Prokop Z. Advancing Enzyme′s Stability and Catalytic Efficiency through Synergy of Force-Field Calculations, Evolutionary Analysis and Machine Learning. ACS Catal. 2023, 13, 12506–12518. 10.1021/acscatal.3c02575. PubMed DOI PMC
Wicky B. I. M.; Milles L. F.; Courbet A.; Ragotte R. J.; Dauparas J.; Kinfu E.; Tipps S.; Kibler R. D.; Baek M.; DiMaio F.; Li X.; Carter L.; Kang A.; Nguyen H.; Bera A. K.; Baker D. Hallucinating Symmetric Protein Assemblies. Science 2022, 378 (6615), 56–61. 10.1126/science.add1964. PubMed DOI PMC
Hawkins-Hooker A.; Depardieu F.; Baur S.; Couairon G.; Chen A.; Bikard D. Generating Functional Protein Variants with Variational Autoencoders. PLoS Comput. Biol. 2021, 17 (2), e100873610.1371/journal.pcbi.1008736. PubMed DOI PMC
Vasina M.; Vanacek P.; Hon J.; Kovar D.; Faldynova H.; Kunka A.; Buryska T.; Badenhorst C. P. S.; Mazurenko S.; Bednar D.; Stavrakis S.; Bornscheuer U. T.; deMello A.; Damborsky J.; Prokop Z. Advanced Database Mining of Efficient Haloalkane Dehalogenases by Sequence and Structure Bioinformatics and Microfluidics. Chem. Catalysis 2022, 2 (10), 2704–2725. 10.1016/j.checat.2022.09.011. DOI
Pardo I.; Bednar D.; Calero P.; Volke D. C.; Damborský J.; Nikel P. I. A Nonconventional Archaeal Fluorinase Identified by In Silico Mining for Enhanced Fluorine Biocatalysis. ACS Catal. 2022, 12 (11), 6570–6577. 10.1021/acscatal.2c01184. PubMed DOI PMC
Yeh A. H.-W.; Norn C.; Kipnis Y.; Tischer D.; Pellock S. J.; Evans D.; Ma P.; Lee G. R.; Zhang J. Z.; Anishchenko I.; Coventry B.; Cao L.; Dauparas J.; Halabiya S.; DeWitt M.; Carter L.; Houk K. N.; Baker D. De Novo Design of Luciferases Using Deep Learning. Nature 2023, 614 (7949), 774–780. 10.1038/s41586-023-05696-3. PubMed DOI PMC
Büchler J.; Malca S. H.; Patsch D.; Voss M.; Turner N. J.; Bornscheuer U. T.; Allemann O.; Le Chapelain C.; Lumbroso A.; Loiseleur O.; Buller R. Algorithm-Aided Engineering of Aliphatic Halogenase WelO5* for the Asymmetric Late-Stage Functionalization of Soraphens. Nat. Commun. 2022, 13, 371.10.1038/s41467-022-27999-1. PubMed DOI PMC
Saito Y.; Oikawa M.; Sato T.; Nakazawa H.; Ito T.; Kameda T.; Tsuda K.; Umetsu M. Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration. ACS Catal. 2021, 11 (23), 14615–14624. 10.1021/acscatal.1c03753. DOI
Greenhalgh J. C.; Fahlberg S. A.; Pfleger B. F.; Romero P. A. Machine Learning-Guided Acyl-ACP Reductase Engineering for Improved in Vivo Fatty Alcohol Production. Nat. Commun. 2021, 12, 5825.10.1038/s41467-021-25831-w. PubMed DOI PMC
Schenkmayerova A.; Pinto G. P.; Toul M.; Marek M.; Hernychova L.; Planas-Iglesias J.; Daniel Liskova V.; Pluskal D.; Vasina M.; Emond S.; Dörr M.; Chaloupkova R.; Bednar D.; Prokop Z.; Hollfelder F.; Bornscheuer U. T.; Damborsky J. Engineering the Protein Dynamics of an Ancestral Luciferase. Nat. Commun. 2021, 12, 3616.10.1038/s41467-021-23450-z. PubMed DOI PMC
Chaloupkova R.; Liskova V.; Toul M.; Markova K.; Sebestova E.; Hernychova L.; Marek M.; Pinto G. P.; Pluskal D.; Waterman J.; Prokop Z.; Damborsky J. Light-Emitting Dehalogenases: Reconstruction of Multifunctional Biocatalysts. ACS Catal. 2019, 9 (6), 4810–4823. 10.1021/acscatal.9b01031. DOI
Klesmith J. R.; Bacik J.-P.; Wrenbeck E. E.; Michalczyk R.; Whitehead T. A. Trade-Offs between Enzyme Fitness and Solubility Illuminated by Deep Mutational Scanning. Proc. Natl. Acad. Sci. U. S. A. 2017, 114 (9), 2265–2270. 10.1073/pnas.1614437114. PubMed DOI PMC
MacLeod B. P.; Parlane F. G. L.; Rupnow C. C.; Dettelbach K. E.; Elliott M. S.; Morrissey T. D.; Haley T. H.; Proskurin O.; Rooney M. B.; Taherimakhsousi N.; Dvorak D. J.; Chiu H. N.; Waizenegger C. E. B.; Ocean K.; Mokhtari M.; Berlinguette C. P. A Self-Driving Laboratory Advances the Pareto Front for Material Properties. Nat. Commun. 2022, 13, 995.10.1038/s41467-022-28580-6. PubMed DOI PMC
Li W.; Yao X.; Zhang T.; Wang R.; Wang L. Hierarchy Ranking Method for Multimodal Multi-Objective Optimization with Local Pareto Fronts. IEEE Trans. Evol. Computat. 2023, 27, 98.10.1109/TEVC.2022.3155757. DOI
Miton C. M.; Tokuriki N. Insertions and Deletions (Indels): A Missing Piece of the Protein Engineering Jigsaw. Biochemistry 2023, 62 (2), 148–157. 10.1021/acs.biochem.2c00188. PubMed DOI
Gonzalez C. E.; Roberts P.; Ostermeier M. Fitness Effects of Single Amino Acid Insertions and Deletions in TEM-1 β-Lactamase. J. Mol. Biol. 2019, 431 (12), 2320–2330. 10.1016/j.jmb.2019.04.030. PubMed DOI PMC
Fan X.; Pan H.; Tian A.; Chung W. K.; Shen Y. SHINE: Protein Language Model-Based Pathogenicity Prediction for Short Inframe Insertion and Deletion Variants. Brief. Bioinform. 2023, 24 (1), bbac58410.1093/bib/bbac584. PubMed DOI PMC
Ross C. M.; Foley G.; Boden M.; Gillam E. M. J. Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). Methods Mol. Biol. 2022, 2397, 85–110. 10.1007/978-1-0716-1826-4_6. PubMed DOI
Park H.-S.; Nam S.-H.; Lee J. K.; Yoon C. N.; Mannervik B.; Benkovic S. J.; Kim H.-S. Design and Evolution of New Catalytic Activity with an Existing Protein Scaffold. Science 2006, 311 (5760), 535–538. 10.1126/science.1118953. PubMed DOI
Babkova P.; Sebestova E.; Brezovsky J.; Chaloupkova R.; Damborsky J. Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity. Chembiochem 2017, 18 (14), 1448–1456. 10.1002/cbic.201700197. PubMed DOI
Arpino J. A. J.; Rizkallah P. J.; Jones D. D. Structural and Dynamic Changes Associated with Beneficial Engineered Single-Amino-Acid Deletion Mutations in Enhanced Green Fluorescent Protein. Acta Crystallogr. D Biol. Crystallogr. 2014, 70 (8), 2152–2162. 10.1107/S139900471401267X. PubMed DOI PMC
Dumas A.; Lercher L.; Spicer C. D.; Davis B. G. Designing Logical Codon Reassignment - Expanding the Chemistry in Biology. Chem. Sci. 2015, 6 (1), 50–69. 10.1039/C4SC01534G. PubMed DOI PMC
Hankore E. D.; Zhang L.; Chen Y.; Liu K.; Niu W.; Guo J. Genetic Incorporation of Noncanonical Amino Acids Using Two Mutually Orthogonal Quadruplet Codons. ACS Synth. Biol. 2019, 8 (5), 1168–1174. 10.1021/acssynbio.9b00051. PubMed DOI PMC
An X.; Chen C.; Wang T.; Huang A.; Zhang D.; Han M.-J.; Wang J. Genetic Incorporation of Selenotyrosine Significantly Improves Enzymatic Activity of Agrobacterium Radiobacter Phosphotriesterase. Chembiochem 2021, 22 (15), 2535–2539. 10.1002/cbic.202000460. PubMed DOI
Zhang H.; Zheng Z.; Dong L.; Shi N.; Yang Y.; Chen H.; Shen Y.; Xia Q. Rational Incorporation of Any Unnatural Amino Acid into Proteins by Machine Learning on Existing Experimental Proofs. Comput. Struct. Biotechnol. J. 2022, 20, 4930–4941. 10.1016/j.csbj.2022.08.063. PubMed DOI PMC
Gainza P.; Sverrisson F.; Monti F.; Rodolà E.; Boscaini D.; Bronstein M. M.; Correia B. E. Deciphering Interaction Fingerprints from Protein Molecular Surfaces Using Geometric Deep Learning. Nat. Methods 2020, 17 (2), 184–192. 10.1038/s41592-019-0666-6. PubMed DOI
Ketata M. A.; Laue C.; Mammadov R.; Stark H.; Wu M.; Corso G.; Marquet C.; Barzilay R.; Jaakkola T. S.. DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models. In The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=AM7WbQxuRS
Geng C.; Xue L. C.; Roel-Touris J.; Bonvin A. M. J. J. Finding the ΔΔG Spot: Are Predictors of Binding Affinity Changes upon Mutations in Protein–Protein Interactions Ready for It?. WIREs Comput. Mol. Sci. 2019, 9 (5), e141010.1002/wcms.1410. DOI
Jiang Y.; Quan L.; Li K.; Li Y.; Zhou Y.; Wu T.; Lyu Q. DGCddG: Deep Graph Convolution for Predicting Protein-Protein Binding Affinity Changes Upon Mutations. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20 (3), 2089–2100. 10.1109/TCBB.2022.3233627. PubMed DOI
Shan S.; Luo S.; Yang Z.; Hong J.; Su Y.; Ding F.; Fu L.; Li C.; Chen P.; Ma J.; Shi X.; Zhang Q.; Berger B.; Zhang L.; Peng J. Deep Learning Guided Optimization of Human Antibody against SARS-CoV-2 Variants with Broad Neutralization. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (11), e212295411910.1073/pnas.2122954119. PubMed DOI PMC
Jin W.; Sarkizova S.; Chen X.; Hacohen N.; Uhler C. Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler’s Rotation Equation. arXiv [q-bio.BM] 2023, 10.48550/arXiv.2301.10814. DOI
Jiang Y.; Neti S. S.; Sitarik I.; Pradhan P.; To P.; Xia Y.; Fried S. D.; Booker S. J.; O’Brien E. P. How Synonymous Mutations Alter Enzyme Structure and Function over Long Timescales. Nat. Chem. 2023, 15 (3), 308–318. 10.1038/s41557-022-01091-z. PubMed DOI PMC
Nikolados E.-M.; Oyarzún D. A. Deep Learning for Optimization of Protein Expression. Curr. Opin. Biotechnol. 2023, 81, 10294110.1016/j.copbio.2023.102941. PubMed DOI
Rosenberg A. A.; Marx A.; Bronstein A. M. Codon-Specific Ramachandran Plots Show Amino Acid Backbone Conformation Depends on Identity of the Translated Codon. Nat. Commun. 2022, 13, 2815.10.1038/s41467-022-30390-9. PubMed DOI PMC
Saunders R.; Deane C. M. Synonymous Codon Usage Influences the Local Protein Structure Observed. Nucleic Acids Res. 2010, 38 (19), 6719–6728. 10.1093/nar/gkq495. PubMed DOI PMC
Outeiral C.; Deane C. M. Codon Language Embeddings Provide Strong Signals for Protein Engineering. bioRxiv 2022, 10.1101/2022.12.15.519894. DOI
Constant D. A.; Gutierrez J. M.; Sastry A. V.; Viazzo R.; Smith N. R.; Hossain J.; Spencer D. A.; Carter H.; Ventura A. B.; Louie M. T. M.; Kohnert C.; Consbruck R.; Bennett J.; Crawford K. A.; Sutton J. M.; Morrison A.; Steiger A. K.; Jackson K. A.; Stanton J. T.; Abdulhaqq S.; Hannum G.; Meier J.; Weinstock M.; Gander M.. Deep Learning-Based Codon Optimization with Large-Scale Synonymous Variant Datasets Enables Generalized Tunable Protein Expression. bioRxiv (Synthetic Biology), February 12, 2023, 2023.02.11.528149, ver. 1. 10.1101/2023.02.11.528149. DOI
Ruscio J. Z.; Kohn J. E.; Ball K. A.; Head-Gordon T. The Influence of Protein Dynamics on the Success of Computational Enzyme Design. J. Am. Chem. Soc. 2009, 131 (39), 14111–14115. 10.1021/ja905396s. PubMed DOI PMC
Peccati F.; Alunno-Rufini S.; Jiménez-Osés G. Accurate Prediction of Enzyme Thermostabilization with Rosetta Using AlphaFold Ensembles. J. Chem. Inf. Model. 2023, 63 (3), 898–909. 10.1021/acs.jcim.2c01083. PubMed DOI PMC
Acevedo-Rocha C. G.; Li A.; D’Amore L.; Hoebenreich S.; Sanchis J.; Lubrano P.; Ferla M. P.; Garcia-Borràs M.; Osuna S.; Reetz M. T. Pervasive Cooperative Mutational Effects on Multiple Catalytic Enzyme Traits Emerge via Long-Range Conformational Dynamics. Nat. Commun. 2021, 12, 1621.10.1038/s41467-021-21833-w. PubMed DOI PMC
Bonk B. M.; Weis J. W.; Tidor B. Machine Learning Identifies Chemical Characteristics That Promote Enzyme Catalysis. J. Am. Chem. Soc. 2019, 141 (9), 4108–4118. 10.1021/jacs.8b13879. PubMed DOI PMC
Zhong E. D.; Bepler T.; Berger B.; Davis J. H. CryoDRGN: Reconstruction of Heterogeneous Cryo-EM Structures Using Neural Networks. Nat. Methods 2021, 18 (2), 176–185. 10.1038/s41592-020-01049-4. PubMed DOI PMC
Jia K.; Kilinc M.; Jernigan R. L. Functional Protein Dynamics Directly from Sequences. J. Phys. Chem. B 2023, 127 (9), 1914–1921. 10.1021/acs.jpcb.2c05766. PubMed DOI PMC
Wang T.; Zhu J.-Y.; Torralba A.; Efros A. A. Dataset Distillation. arXiv [cs.LG] 2018, 10.48550/arXiv.1811.10959. DOI
Hinton G.; Vinyals O.; Dean J. Distilling the Knowledge in a Neural Network. arXiv [stat.ML] 2015, 10.48550/arXiv.1503.02531. DOI
Deng L. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Process. Mag. 2012, 29 (6), 141–142. 10.1109/MSP.2012.2211477. DOI
Le T.-T.-H.; Larasati H. T.; Prihatno A. T.; Kim H.. A Review of Dataset Distillation for Deep Learning. In Proceedings of the 2022 International Conference on Platform Technology and Service (PlatCon), Jeju, South Korea, August 22–24, 2022; IEEE, 2022; pp 34–37. 10.1109/PlatCon55845.2022.9932086 DOI
Yu R.; Liu S.; Wang X. Dataset Distillation: A Comprehensive Review. arXiv [cs.LG] 2023, 10.48550/arXiv.2301.07014. PubMed DOI
Lei S.; Tao D. A Comprehensive Survey of Dataset Distillation. arXiv [cs.LG] 2023, 10.48550/arXiv.2301.05603. PubMed DOI
Abraham M.; Apostolov R.; Barnoud J.; Bauer P.; Blau C.; Bonvin A. M. J. J.; Chavent M.; Chodera J.; Čondić-Jurkić K.; Delemotte L.; Grubmüller H.; Howard R. J.; Jordan E. J.; Lindahl E.; Ollila O. H. S.; Selent J.; Smith D. G. A.; Stansfeld P. J.; Tiemann J. K. S.; Trellet M.; Woods C.; Zhmurov A. Sharing Data from Molecular Simulations. J. Chem. Inf. Model. 2019, 59 (10), 4093–4099. 10.1021/acs.jcim.9b00665. PubMed DOI
Serafeim A.-P.; Salamanos G.; Patapati K. K.; Glykos N. M. Sensitivity of Folding Molecular Dynamics Simulations to Even Minor Force Field Changes. J. Chem. Inf. Model. 2016, 56 (10), 2035–2041. 10.1021/acs.jcim.6b00493. PubMed DOI
Wilkinson M. D.; Dumontier M.; Aalbersberg I. J. J.; Appleton G.; Axton M.; Baak A.; Blomberg N.; Boiten J.-W.; da Silva Santos L. B.; Bourne P. E.; Bouwman J.; Brookes A. J.; Clark T.; Crosas M.; Dillo I.; Dumon O.; Edmunds S.; Evelo C. T.; Finkers R.; Gonzalez-Beltran A.; Gray A. J. G.; Groth P.; Goble C.; Grethe J. S.; Heringa J.; ’t Hoen P. A. C.; Hooft R.; Kuhn T.; Kok R.; Kok J.; Lusher S. J.; Martone M. E.; Mons A.; Packer A. L.; Persson B.; Rocca-Serra P.; Roos M.; van Schaik R.; Sansone S.-A.; Schultes E.; Sengstag T.; Slater T.; Strawn G.; Swertz M. A.; Thompson M.; van der Lei J.; van Mulligen E.; Velterop J.; Waagmeester A.; Wittenburg P.; Wolstencroft K.; Zhao J.; Mons B. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 16001810.1038/sdata.2016.18. PubMed DOI PMC
Tiemann J. K. S.; Szczuka M.; Bouarroudj L.; Oussaren M.; Garcia S.; Howard R. J.; Delemotte L.; Lindahl E.; Baaden M.; Lindorff-Larsen K.; Chavent M.; Poulain P. MDverse: Shedding Light on the Dark Matter of Molecular Dynamics Simulations. bioRxiv 2023, 10.1101/2023.05.02.538537. PubMed DOI PMC
Durumeric A. E. P.; Charron N. E.; Templeton C.; Musil F.; Bonneau K.; Pasos-Trejo A. S.; Chen Y.; Kelkar A.; Noé F.; Clementi C. Machine Learned Coarse-Grained Protein Force-Fields: Are We There Yet?. Curr. Opin. Struct. Biol. 2023, 79, 10253310.1016/j.sbi.2023.102533. PubMed DOI PMC
Beyer L.; Hénaff O. J.; Kolesnikov A.; Zhai X.; van den Oord A. Are We Done with ImageNet?. arXiv [cs.CV] 2020, 10.48550/arXiv.2006.07159. DOI
Everingham M.; Van Gool L.; Williams C. K. I.; Winn J.; Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303.10.1007/s11263-009-0275-4. DOI
Deng J.; Dong W.; Socher R.; Li L.-J.; Li K.; Fei-Fei L.. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, June 20–25, 2009; IEEE, 2009; pp 248–255. 10.1109/CVPR.2009.5206848 DOI
LeCun Y.; Haffner P.; Bottou L.; Bengio Y.. Object Recognition with Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision; Springer, 1999; pp 319–345.
He K.; Zhang X.; Ren S.; Sun J.. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, June 27–30, 2016; IEEE, 2016; pp 770–778. 10.1109/CVPR.2016.90 DOI
Thiyagalingam J.; Shankar M.; Fox G.; Hey T. Scientific Machine Learning Benchmarks. Nature Reviews Physics 2022, 4 (6), 413–420. 10.1038/s42254-022-00441-7. DOI
Steinegger M.; Söding J. Clustering Huge Protein Sequence Sets in Linear Time. Nat. Commun. 2018, 9, 2542.10.1038/s41467-018-04964-5. PubMed DOI PMC
Gao M.; Skolnick J. Structural Space of Protein-Protein Interfaces Is Degenerate, Close to Complete, and Highly Connected. Proc. Natl. Acad. Sci. U. S. A. 2010, 107 (52), 22517–22522. 10.1073/pnas.1012820107. PubMed DOI PMC
Burra P. V.; Zhang Y.; Godzik A.; Stec B. Global Distribution of Conformational States Derived from Redundant Models in the PDB Points to Non-Uniqueness of the Protein Structure. Proc. Natl. Acad. Sci. U. S. A. 2009, 106 (26), 10505–10510. 10.1073/pnas.0812152106. PubMed DOI PMC
Robin X.; Leemann M.; Sagasta A.; Eberhardt J.; Schwede T.; Durairaj J. Automated Benchmarking of Combined Protein Structure and Ligand Conformation Prediction. Authorea Preprints 2023, 10.22541/au.168382988.85108031/v1. PubMed DOI
Wang R.; Fang X.; Lu Y.; Yang C.-Y.; Wang S. The PDBbind Database: Methodologies and Updates. J. Med. Chem. 2005, 48 (12), 4111–4119. 10.1021/jm048957q. PubMed DOI
Dallago C.; Mou J.; Johnston K. E.; Wittmann B.; Bhattacharya N.; Goldman S.; Madani A.; Yang K. K.. FLIP: Benchmark Tasks in Fitness Landscape Inference for Proteins. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, Vol. 1; Vanschoren J., Yeung S., Eds.; Curran Associates, Inc.: Red Hook, NY, 2021.
Morehead A.; Chen C.; Sedova A.; Cheng J. DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction. arXiv [q-bio.QM] 2021, 10.48550/arXiv.2106.04362. PubMed DOI PMC
Kryshtafovych A.; Schwede T.; Topf M.; Fidelis K.; Moult J. Critical Assessment of Methods of Protein Structure Prediction (CASP)-Round XIV. Proteins 2021, 89 (12), 1607–1617. 10.1002/prot.26237. PubMed DOI PMC
Janin J.; Henrick K.; Moult J.; Eyck L. T.; Sternberg M. J. E.; Vajda S.; Vakser I.; Wodak S. J. Critical Assessment of PRedicted Interactions. CAPRI: A Critical Assessment of PRedicted Interactions. Proteins 2003, 52 (1), 2–9. 10.1002/prot.10381. PubMed DOI
Andreoletti G.; Hoskins R. A.; Repo S.; Barsky D.; Brenner S. E.; Mult J.; Participants C. Abstract 3295: CAGI: The Critical Assessment of Genome Interpretation, a Community Experiment to Evaluate Phenotype Prediction: Implications for Predicting Impact of Variants in Cancer. Cancer Res. 2018, 78, 3295–3295. 10.1158/1538-7445.AM2018-3295. DOI
Grešová K.; Martinek V.; Čechák D.; Šimeček P.; Alexiou P. Genomic Benchmarks: A Collection of Datasets for Genomic Sequence Classification. BMC Genomic Data 2023, 24, 25.10.1186/s12863-023-01123-8. PubMed DOI PMC
Buterez D.; Janet J. P.; Kiddle S. J.; Liò P. MF-PCBA: Multifidelity High-Throughput Screening Benchmarks for Drug Discovery and Machine Learning. J. Chem. Inf. Model. 2023, 63 (9), 2667–2678. 10.1021/acs.jcim.2c01569. PubMed DOI PMC
Walsh I.; Fishman D.; Garcia-Gasulla D.; Titma T.; Pollastri G.; Capriotti E.; Casadio R.; Capella-Gutierrez S.; Cirillo D.; Del Conte A.; Dimopoulos A. C.; Del Angel V. D.; Dopazo J.; Fariselli P.; Fernandez J. M.; Huber F.; Kreshuk A.; Lenaerts T.; Martelli P. L.; Navarro A.; Broin P. O; Pinero J.; Piovesan D.; Reczko M.; Ronzano F.; Satagopam V.; Savojardo C.; Spiwok V.; Tangaro M. A.; Tartari G.; Salgado D.; Valencia A.; Zambelli F.; Harrow J.; Psomopoulos F. E.; Tosatto S. C. E. DOME: Recommendations for Supervised Machine Learning Validation in Biology. Nat. Methods 2021, 18 (10), 1122–1127. 10.1038/s41592-021-01205-4. PubMed DOI
Mirdita M.; Schütze K.; Moriwaki Y.; Heo L.; Ovchinnikov S.; Steinegger M. ColabFold: Making Protein Folding Accessible to All. Nat. Methods 2022, 19 (6), 679–682. 10.1038/s41592-022-01488-1. PubMed DOI PMC
Lee B. D.; Gitter A.; Greene C. S.; Raschka S.; Maguire F.; Titus A. J.; Kessler M. D.; Lee A. J.; Chevrette M. G.; Stewart P. A.; Britto-Borges T.; Cofer E. M.; Yu K.-H.; Carmona J. J.; Fertig E. J.; Kalinin A. A.; Signal B.; Lengerich B. J.; Triche T. J. Jr; Boca S. M. Ten Quick Tips for Deep Learning in Biology. PLoS Comput. Biol. 2022, 18 (3), e1009803.10.1371/journal.pcbi.1009803. PubMed DOI PMC
Samek W.; Müller K.-R.. Towards Explainable Artificial Intelligence. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Samek W., Montavon G., Vedaldi A., Hansen L. K., Müller K.-R., Eds.; Springer, 2019; pp 5–22.
Wellawatte G. P.; Gandhi H. A.; Seshadri A.; White A. D. A Perspective on Explanations of Molecular Prediction Models. J. Chem. Theory Comput. 2023, 19 (8), 2149–2160. 10.1021/acs.jctc.2c01235. PubMed DOI PMC
Holzinger A.; Saranti A.; Molnar C.; Biecek P.; Samek W.. Explainable AI Methods - A Brief Overview. In xxAI - Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers; Holzinger A., Goebel R., Fong R., Moon T., Müller K.-R., Samek W., Eds.; Springer, 2022; pp 13–38.
Montavon G.; Binder A.; Lapuschkin S.; Samek W.; Müller K.-R.. Layer-Wise Relevance Propagation: An Overview. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Samek W., Montavon G., Vedaldi A., Hansen L. K., Müller K.-R., Eds.; Springer, 2019; pp 193–209.
van der Zanden T. C.; Bodlaender H. L.; Hamers H. J. M. Efficiently Computing the Shapley Value of Connectivity Games in Low-Treewidth Graphs. Oper. Res. Int. J. 2023, 23, 6.10.1007/s12351-023-00742-4. DOI
Ribeiro M. T.; Singh S.; Guestrin C.. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD ’16 San Francisco, CA, August 13–17, 2016; Association for Computing Machinery: New York, NY, 2016; pp 1135–1144.
Ivanovs M.; Kadikis R.; Ozols K. Perturbation-Based Methods for Explaining Deep Neural Networks: A Survey. Pattern Recognit. Lett. 2021, 150, 228–234. 10.1016/j.patrec.2021.06.030. DOI
Ma J.; Yu M. K.; Fong S.; Ono K.; Sage E.; Demchak B.; Sharan R.; Ideker T. Using Deep Learning to Model the Hierarchical Structure and Function of a Cell. Nat. Methods 2018, 15 (4), 290–298. 10.1038/nmeth.4627. PubMed DOI PMC
Novakovsky G.; Dexter N.; Libbrecht M. W.; Wasserman W. W.; Mostafavi S. Obtaining Genetics Insights from Deep Learning via Explainable Artificial Intelligence. Nat. Rev. Genet. 2023, 24 (2), 125–137. 10.1038/s41576-022-00532-2. PubMed DOI
Fortelny N.; Bock C. Knowledge-Primed Neural Networks Enable Biologically Interpretable Deep Learning on Single-Cell Sequencing Data. Genome Biol. 2020, 21, 190.10.1186/s13059-020-02100-5. PubMed DOI PMC
Nikolados E.-M.; Wongprommoon A.; Aodha O. M.; Cambray G.; Oyarzún D. A. Accuracy and Data Efficiency in Deep Learning Models of Protein Expression. Nat. Commun. 2022, 13, 7755.10.1038/s41467-022-34902-5. PubMed DOI PMC
Xu F.; Uszkoreit H.; Du Y.; Fan W.; Zhao D.; Zhu J.. Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges. In Natural Language Processing and Chinese Computing; Springer, 2019; pp 563–574.
Shimazaki T.; Tachikawa M. Collaborative Approach between Explainable Artificial Intelligence and Simplified Chemical Interactions to Explore Active Ligands for Cyclin-Dependent Kinase 2. ACS Omega 2022, 7 (12), 10372–10381. 10.1021/acsomega.1c06976. PubMed DOI PMC
Probst D.Explainable Prediction of Catalysing Enzymes from Reactions Using Multilayer Perceptrons. bioRxiv (Bioinformatics), January 30, 2023, 2023.01.28.526009, ver. 1. 10.1101/2023.01.28.526009. DOI
Li C.; Liu J.; Chen J.; Yuan Y.; Yu J.; Gou Q.; Guo Y.; Pu X. An Interpretable Convolutional Neural Network Framework for Analyzing Molecular Dynamics Trajectories: A Case Study on Functional States for G-Protein-Coupled Receptors. J. Chem. Inf. Model. 2022, 62 (6), 1399–1410. 10.1021/acs.jcim.2c00085. PubMed DOI
Tan J.; Zhang Y.. ExplainableFold: Understanding AlphaFold Prediction with Explainable AI. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining KDD ’23; Association for Computing Machinery: New York, NY, 2023; pp 2166–2176.
Hoover B.; Strobelt H.; Gehrmann S.. ExBERT: A VIsual ANalysis TOol to EXplore LEarned REpresentations in TRansformer MOdels. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Association for Computational Linguistics, 2020; pp 187–196.
Ferruz N.; Höcker B. Controllable Protein Design with Language Models. Nature Machine Intelligence 2022, 4 (6), 521–532. 10.1038/s42256-022-00499-z. DOI
Abd Elrahman S. M.; Abraham A. A review of class imbalance problem. J. Netw. Innov. Comput. 2013, 1, 332–340.
Haixiang G.; Yijing L.; Shang J.; Mingyun G.; Yuanyue H.; Bing G. Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Syst. Appl. 2017, 73, 220–239. 10.1016/j.eswa.2016.12.035. DOI
Kaur H.; Pannu H. S.; Malhi A. K. A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions. ACM Comput. Surv. 2020, 52 (4), 1–36. 10.1145/3343440. DOI
Esposito C.; Landrum G. A.; Schneider N.; Stiefl N.; Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J. Chem. Inf. Model. 2021, 61 (6), 2623–2640. 10.1021/acs.jcim.1c00160. PubMed DOI
Weidinger L.; Mellor J.; Rauh M.; Griffin C.; Uesato J.; Huang P.-S.; Cheng M.; Glaese M.; Balle B.; Kasirzadeh A.; Kenton Z.; Brown S.; Hawkins W.; Stepleton T.; Biles C.; Birhane A.; Haas J.; Rimell L.; Hendricks L. A.; Isaac W.; Legassick S.; Irving G.; Gabriel I.. Ethical and Social Risks of Harm from Language Models. arXiv (Computer Science.Computation and Language), December 8, 2021, 2112.04359, ver. 1.10.48550/arXiv.2112.04359 DOI
Kessler M. D.; Yerges-Armstrong L.; Taub M. A.; Shetty A. C.; Maloney K.; Jeng L. J. B.; Ruczinski I.; Levin A. M.; Williams L. K.; Beaty T. H.; Mathias R. A.; Barnes K. C.; et al. Challenges and Disparities in the Application of Personalized Genomic Medicine to Populations with African Ancestry. Nat. Commun. 2016, 7, 12521.10.1038/ncomms12521. PubMed DOI PMC
Sullivan B. J.; Nguyen T.; Durani V.; Mathur D.; Rojas S.; Thomas M.; Syu T.; Magliery T. J. Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability. J. Mol. Biol. 2012, 420 (4–5), 384–399. 10.1016/j.jmb.2012.04.025. PubMed DOI PMC
Fang J. A Critical Review of Five Machine Learning-Based Algorithms for Predicting Protein Stability Changes upon Mutation. Brief. Bioinform. 2020, 21 (4), 1285–1292. 10.1093/bib/bbz071. PubMed DOI PMC
Pucci F.; Bernaerts K. V.; Kwasigroch J. M.; Rooman M. Quantification of Biases in Predictions of Protein Stability Changes upon Mutations. Bioinformatics 2018, 34 (21), 3659–3665. 10.1093/bioinformatics/bty348. PubMed DOI
Caldararu O.; Blundell T. L.; Kepp K. P. A Base Measure of Precision for Protein Stability Predictors: Structural Sensitivity. BMC Bioinformatics 2021, 22, 88.10.1186/s12859-021-04030-w. PubMed DOI PMC
Scantlebury J.; Vost L.; Carbery A.; Hadfield T. E.; Turnbull O. M.; Brown N.; Chenthamarakshan V.; Das P.; Grosjean H.; von Delft F.; Deane C. M. A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening. J. Chem. Inf. Model. 2023, 63 (10), 2960–2974. 10.1021/acs.jcim.3c00322. PubMed DOI PMC
Hebert-Johnson U.; Kim M.; Reingold O.; Rothblum G.. Multicalibration: Calibration for the (COmputationally-Identifiable) Masses. In Proceedings of the 35th International Conference on Machine Learning; Dy J., Krause A., Eds.; Proceedings of Machine Learning Research; PMLR, 10--15 Jul 2018; Vol. 80, pp 1939–1948.
Gopalan P.; Kim M. P.; Singhal M. A.; Zhao S.. Low-Degree Multicalibration. In Proceedings of Thirty Fifth Conference on Learning Theory; Loh P.-L., Raginsky M., Eds.; Proceedings of Machine Learning Research, Vol. 178; PMLR, 2022; pp 3193–3234.
Kim M. P.; Ghorbani A.; Zou J.. Multiaccuracy: Black-Box Post-Processing for Fairness in Classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society AIES ’19; Association for Computing Machinery: New York, NY, 2019; pp 247–254.
Pessach D.; Shmueli E. Algorithmic Fairness. arXiv [cs.CY] 2020, 10.48550/arXiv.2001.09784. DOI
Minot M.; Reddy S. T.. Meta Learning Improves Robustness and Performance in Machine Learning-Guided Protein Engineering. bioRxiv, January 30, 2023, 2023.01.30.526201, ver. 1. 10.1101/2023.01.30.526201. DOI
Musaelian A.; Johansson A.; Batzner S.; Kozinsky B. Scaling the Leading Accuracy of Deep Equivariant Models to Biomolecular Simulations of Realistic Size. arXiv [physics.comp-ph] 2023, 10.48550/arXiv.2304.10061. DOI
Shaw D. E.; Adams P. J.; Azaria A.; Bank J. A.; Batson B.; Bell A.; Bergdorf M.; Bhatt J.; Butts J. A.; Correia T.; Dirks R. M.; Dror R. O.; Eastwood M. P.; Edwards B.; Even A.; Feldmann P.; Fenn M.; Fenton C. H.; Forte A.; Gagliardo J.; Gill G.; Gorlatova M.; Greskamp B.; Grossman J. P.; Gullingsrud J.; Harper A.; Hasenplaugh W.; Heily M.; Heshmat B. C.; Hunt J.; Ierardi D. J.; Iserovich L.; Jackson B. L.; Johnson N. P.; Kirk M. M.; Klepeis J. L.; Kuskin J. S.; Mackenzie K. M.; Mader R. J.; McGowen R.; McLaughlin A.; Moraes M. A.; Nasr M. H.; Nociolo L. J.; O’Donnell L.; Parker A.; Peticolas J. L.; Pocina G.; Predescu C.; Quan T.; Salmon J. K.; Schwink C.; Shim K. S.; Siddique N.; Spengler J.; Szalay T.; Tabladillo R.; Tartler R.; Taube A. G.; Theobald M.; Towles B.; Vick W.; Wang S. C.; Wazlowski M.; Weingarten M. J.; Williams J. M.; Yuh K. A.. Anton 3: Twenty Microseconds of Molecular Dynamics Simulation before Lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC ’21; Association for Computing Machinery: New York, NY, 2021; pp 1–11.
Perdomo-Ortiz A.; Benedetti M.; Realpe-Gómez J.; Biswas R. Opportunities and Challenges for Quantum-Assisted Machine Learning in near-Term Quantum Computers. Quantum Sci. Technol. 2018, 3 (3), 03050210.1088/2058-9565/aab859. DOI
Caro M. C.; Huang H.-Y.; Cerezo M.; Sharma K.; Sornborger A.; Cincio L.; Coles P. J. Generalization in Quantum Machine Learning from Few Training Data. Nat. Commun. 2022, 13, 4919.10.1038/s41467-022-32550-3. PubMed DOI PMC
Daley A. J.; Bloch I.; Kokail C.; Flannigan S.; Pearson N.; Troyer M.; Zoller P. Practical Quantum Advantage in Quantum Simulation. Nature 2022, 607 (7920), 667–676. 10.1038/s41586-022-04940-6. PubMed DOI
Ollitrault P. J.; Miessen A.; Tavernelli I. Molecular Quantum Dynamics: A Quantum Computing Perspective. Acc. Chem. Res. 2021, 54 (23), 4229–4238. 10.1021/acs.accounts.1c00514. PubMed DOI
Bender E. M.; Gebru T.; McMillan-Major A.; Shmitchell S.. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency FAccT ’21; Association for Computing Machinery:New York, NY, 2021; pp 610–623.
Patterson D.; Gonzalez J.; Holzle U.; Le Q.; Liang C.; Munguia L.-M.; Rothchild D.; So D. R.; Texier M.; Dean J. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer 2022, 55, 18.10.1109/MC.2022.3148714. DOI
Vinod R.; Chen P.-Y.; Das P. Reprogramming Pretrained Language Models for Protein Sequence Representation Learning. arXiv [cs.LG] 2023, 10.48550/arXiv.2301.02120. DOI
Caldararu O.; Blundell T. L.; Kepp K. P. Three Simple Properties Explain Protein Stability Change upon Mutation. J. Chem. Inf. Model. 2021, 61 (4), 1981–1988. 10.1021/acs.jcim.1c00201. PubMed DOI
Hu E. J.; Shen Y.; Wallis P.; Allen-Zhu Z.; Li Y.; Wang S.; Wang L.; Chen W.. LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations, April 25–29, 2022; OpenReview, 2022. https://openreview.net/forum?id=nZeVKeeFYf9
Taori R.; Gulrajani I.; Zhang T.; Dubois Y.; Li X.; Guestrin C.. Stanford Alpaca: An Instruction-Following Llama Model. 2023.
Yang A.; Miech A.; Sivic J.; Laptev I.; Schmid C.; Koyejo S.; Mohamed S.; Agarwal A.; Belgrave D.; Cho K.; Oh A. Zero-Shot Video Question Answering via Frozen Bidirectional Language Models. Adv. Neural Inf. Process. Syst. 2022, 35, 124–141.
Anstine D. M.; Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J. Am. Chem. Soc. 2023, 145 (16), 8736–8750. 10.1021/jacs.2c13467. PubMed DOI PMC
Popova M.; Isayev O.; Tropsha A. Deep Reinforcement Learning for de Novo Drug Design. Sci Adv 2018, 4 (7), eaap788510.1126/sciadv.aap7885. PubMed DOI PMC
Lutz I. D.; Wang S.; Norn C.; Courbet A.; Borst A. J.; Zhao Y. T.; Dosey A.; Cao L.; Xu J.; Leaf E. M.; Treichel C.; Litvicov P.; Li Z.; Goodson A. D.; Rivera-Sánchez P.; Bratovianu A.-M.; Baek M.; King N. P.; Ruohola-Baker H.; Baker D. Top-down Design of Protein Architectures with Reinforcement Learning. Science 2023, 380 (6642), 266–273. 10.1126/science.adf6591. PubMed DOI
Wang Y.; Tang H.; Huang L.; Pan L.; Yang L.; Yang H.; Mu F.; Yang M. Self-Play Reinforcement Learning Guides Protein Engineering. Nature Machine Intelligence 2023, 5 (8), 845–860. 10.1038/s42256-023-00691-9. DOI