Machine Learning-Guided Protein Engineering

. 2023 Nov 03 ; 13 (21) : 13863-13895. [epub] 20231013

Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid37942269

Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.

Zobrazit více v PubMed

Wu S.; Snajdrova R.; Moore J. C.; Baldenius K.; Bornscheuer U. T. Biocatalysis: Enzymatic Synthesis for Industrial Applications. Angew. Chem., Int. Ed. Engl. 2021, 60 (1), 88–119. 10.1002/anie.202006648. PubMed DOI PMC

Bell E. L.; Finnigan W.; France S. P.; Green A. P.; Hayes M. A.; Hepworth L. J.; Lovelock S. L.; Niikura H.; Osuna S.; Romero E.; Ryan K. S.; Turner N. J.; Flitsch S. L. Biocatalysis. Nat. Rev. Methods Primers 2021, 1, 46.10.1038/s43586-021-00044-z. DOI

Silvestre B. S.; Ţîrcă D. M. Innovations for Sustainable Development: Moving toward a Sustainable Future. J. Clean. Prod. 2019, 208, 325–332. 10.1016/j.jclepro.2018.09.244. DOI

Tiso T.; Winter B.; Wei R.; Hee J.; de Witt J.; Wierckx N.; Quicker P.; Bornscheuer U. T.; Bardow A.; Nogales J.; Blank L. M. The Metabolic Potential of Plastics as Biotechnological Carbon Sources - Review and Targets for the Future. Metab. Eng. 2022, 71, 77–98. 10.1016/j.ymben.2021.12.006. PubMed DOI

Pimviriyakul P.; Wongnate T.; Tinikul R.; Chaiyen P. Microbial Degradation of Halogenated Aromatics: Molecular Mechanisms and Enzymatic Reactions. Microb. Biotechnol. 2020, 13 (1), 67–86. 10.1111/1751-7915.13488. PubMed DOI PMC

Marques S. M.; Planas-Iglesias J.; Damborsky J. Web-Based Tools for Computational Enzyme Design. Curr. Opin. Struct. Biol. 2021, 69, 19–34. 10.1016/j.sbi.2021.01.010. PubMed DOI

Chang C.; Deringer V. L.; Katti K. S.; Van Speybroeck V.; Wolverton C. M. Simulations in the Era of Exascale Computing. Nat Rev Mater 2023, 8 (5), 309–313. 10.1038/s41578-023-00540-6. PubMed DOI PMC

Pyzer-Knapp E. O.; Pitera J. W.; Staar P. W. J.; Takeda S.; Laino T.; Sanders D. P.; Sexton J.; Smith J. R.; Curioni A. Accelerating Materials Discovery Using Artificial Intelligence, High Performance Computing and Robotics. npj Comput. Mater. 2022, 8, 84.10.1038/s41524-022-00765-z. DOI

Singh V.; Patra S.; Murugan N. A.; Toncu D.-C.; Tiwari A. Recent Trends in Computational Tools and Data-Driven Modeling for Advanced Materials. Mater. Adv. 2022, 3 (10), 4069–4087. 10.1039/D2MA00067A. DOI

Greener J. G.; Kandathil S. M.; Moffat L.; Jones D. T. A Guide to Machine Learning for Biologists. Nat. Rev. Mol. Cell Biol. 2022, 23 (1), 40–55. 10.1038/s41580-021-00407-0. PubMed DOI

Beller M.; Bender M.; Bornscheuer U. T.; Schunk S. Catalysis – Far from Being a Mature Technology. Chem. Ing. Tech. 2022, 94 (11), 1559–1559. 10.1002/cite.202271102. DOI

Oza V. H.; Whitlock J. H.; Wilk E. J.; Uno-Antonison A.; Wilk B.; Gajapathy M.; Howton T. C.; Trull A.; Ianov L.; Worthey E. A.; Lasseigne B. N. Ten Simple Rules for Using Public Biological Data for Your Research. PLoS Comput. Biol. 2023, 19 (1), e101074910.1371/journal.pcbi.1010749. PubMed DOI PMC

Mazurenko S.; Prokop Z.; Damborsky J. Machine Learning in Enzyme Engineering. ACS Catal. 2020, 10 (2), 1210–1223. 10.1021/acscatal.9b04321. DOI

Strokach A.; Kim P. M. Deep Generative Modeling for Protein Design. Curr. Opin. Struct. Biol. 2022, 72, 226–236. 10.1016/j.sbi.2021.11.008. PubMed DOI

Ding W.; Nakai K.; Gong H. Protein Design via Deep Learning. Brief. Bioinform. 2022, 23 (3), bbac10210.1093/bib/bbac102. PubMed DOI PMC

Pan X.; Kortemme T. Recent Advances in de Novo Protein Design: Principles, Methods, and Applications. J. Biol. Chem. 2021, 296, 10055810.1016/j.jbc.2021.100558. PubMed DOI PMC

Chandra A.; Tünnermann L.; Löfstedt T.; Gratz R. Transformer-Based Deep Learning for Predicting Protein Properties in the Life Sciences. Elife 2023, 12, e8281910.7554/eLife.82819. PubMed DOI PMC

Lin T.; Wang Y.; Liu X.; Qiu X. A Survey of Transformers. AI Open 2022, 3, 111–132. 10.1016/j.aiopen.2022.10.001. DOI

Zhang X.-M.; Liang L.; Liu L.; Tang M.-J. Graph Neural Networks and Their Current Applications in Bioinformatics. Front. Genet. 2021, 12, 69004910.3389/fgene.2021.690049. PubMed DOI PMC

Bronstein M. M.; Bruna J.; Cohen T.; Veličković P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv [cs.LG] 2021, 10.48550/arXiv.2104.13478. DOI

Alzubaidi L.; Zhang J.; Humaidi A. J.; Al-Dujaili A.; Duan Y.; Al-Shamma O.; Santamaría J.; Fadhel M. A.; Al-Amidie M.; Farhan L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J Big Data 2021, 8, 53.10.1186/s40537-021-00444-8. PubMed DOI PMC

Calin O.Deep Learning Architectures; Springer, 2020.

Goodfellow I.; Bengio Y.; Courville A.. Deep Learning; MIT Press, 2016.

Bordin N.; Dallago C.; Heinzinger M.; Kim S.; Littmann M.; Rauer C.; Steinegger M.; Rost B.; Orengo C. Novel Machine Learning Approaches Revolutionize Protein Knowledge. Trends Biochem. Sci. 2023, 48 (4), 345–359. 10.1016/j.tibs.2022.11.001. PubMed DOI PMC

Mowbray M.; Savage T.; Wu C.; Song Z.; Cho B. A.; Del Rio-Chanona E. A.; Zhang D. Machine Learning for Biochemical Engineering: A Review. Biochemical Eng. J. 2021, 172, 108054.10.1016/j.bej.2021.108054. DOI

Hon J.; Marusiak M.; Martinek T.; Kunka A.; Zendulka J.; Bednar D.; Damborsky J. SoluProt: Prediction of Soluble Protein Expression in Escherichia Coli. Bioinformatics 2021, 37 (1), 23–28. 10.1093/bioinformatics/btaa1102. PubMed DOI PMC

Wu K. E.; Yang K. K.; van den Berg R.; Zou J. Y.; Lu A. X.; Amini A. P. Protein Structure Generation via Folding Diffusion. arXiv [q-bio.BM] 2022, 10.48550/arXiv.2209.15611. PubMed DOI PMC

Watson J. L.; Juergens D.; Bennett N. R.; Trippe B. L.; Yim J.; Eisenach H. E.; Ahern W.; Borst A. J.; Ragotte R. J.; Milles L. F.; Wicky B. I. M.; Hanikel N.; Pellock S. J.; Courbet A.; Sheffler W.; Wang J.; Venkatesh P.; Sappington I.; Torres S. V.; Lauko A.; De Bortoli V.; Mathieu E.; Ovchinnikov S.; Barzilay R.; Jaakkola T. S.; DiMaio F.; Baek M.; Baker D. De Novo Design of Protein Structure and Function with RFdiffusion. Nature 2023, 620, 1089.10.1038/s41586-023-06415-8. PubMed DOI PMC

Corso G.; Stärk H.; Jing B.; Barzilay R.; Jaakkola T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. arXiv [q-bio.BM] 2022, 10.48550/arXiv.2210.01776. DOI

Guo Z.; Liu J.; Wang Y.; Chen M.; Wang D.; Xu D.; Cheng J. Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action. arXiv [cs.LG] 2023, 10.48550/arXiv.2302.10907. DOI

Shroff R.; Cole A. W.; Diaz D. J.; Morrow B. R.; Donnell I.; Annapareddy A.; Gollihar J.; Ellington A. D.; Thyer R. Discovery of Novel Gain-of-Function Mutations Guided by Structure-Based Deep Learning. ACS Synth. Biol. 2020, 9 (11), 2927–2935. 10.1021/acssynbio.0c00345. PubMed DOI

Zhang Z.; Xu M.; Chenthamarakshan V.; Lozano A.; Das P.; Tang J.. Enhancing Protein Language Models with Structure-Based Encoder and Pre-Training. arXiv (Quantitative Biology.Quantitative Methods), March 11, 2023, 2303.06275, ver. 1. 10.48550/arXiv.2303.06275 DOI

Diaz D. J.; Gong C.; Ouyang-Zhang J.; Loy J. M.; Wells J.; Yang D.; Ellington A. D.; Dimakis A.; Klivans A. R.. Stability Oracle: A Structure-Based Graph-Transformer for Identifying Stabilizing Mutations. bioRxiv (Biochemistry), March 22, 2023, 2023.05.15.540857. 10.1101/2023.05.15.540857. PubMed DOI PMC

Ferruz N.; Heinzinger M.; Akdel M.; Goncearenco A.; Naef L.; Dallago C. From Sequence to Function through Structure: Deep Learning for Protein Design. Comput. Struct. Biotechnol. J. 2023, 21, 238–250. 10.1016/j.csbj.2022.11.014. PubMed DOI PMC

Madani A.; Krause B.; Greene E. R.; Subramanian S.; Mohr B. P.; Holton J. M.; Olmos J. L. Jr; Xiong C.; Sun Z. Z.; Socher R.; Fraser J. S.; Naik N. Large Language Models Generate Functional Protein Sequences across Diverse Families. Nat. Biotechnol. 2023, 41, 1099.10.1038/s41587-022-01618-2. PubMed DOI PMC

Li Y.; Rezaei M. A.; Li C.; Li X.. DeepAtom: A Framework for Protein-Ligand Binding Affinity Prediction. In Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, November 18–21, 2019; IEEE, 2019; pp 303–310.

Park S.; Seok C. GalaxyWater-CNN: Prediction of Water Positions on the Protein Structure by a 3D-Convolutional Neural Network. J. Chem. Inf. Model. 2022, 62 (13), 3157–3168. 10.1021/acs.jcim.2c00306. PubMed DOI

Ramesh A.; Dhariwal P.; Nichol A.; Chu C.; Chen M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv [cs.CV] 2022, 10.48550/ARXIV.2204.06125. DOI

Rombach R.; Blattmann A.; Lorenz D.; Esser P.; Ommer B. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv [cs.CV] 2021, 10.48550/arXiv.2112.10752. DOI

Schneuing A.; Du Y.; Harris C.; Jamasb A.; Igashov I.; Du W.; Blundell T.; Lió P.; Gomes C.; Welling M.; Bronstein M.; Correia B. Structure-Based Drug Design with Equivariant Diffusion Models. arXiv [q-bio.BM] 2022, 10.48550/arXiv.2210.13695. DOI

Igashov I.; Stärk H.; Vignac C.; Satorras V. G.; Frossard P.; Welling M.; Bronstein M. M.; Correia B.. Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design. OpenReview, February 1, 2023. https://openreview.net/forum?id=cnsHSSLnHVV.

Yang A.; Nagrani A.; Seo P. H.; Miech A.; Pont-Tuset J.; Laptev I.; Sivic J.; Schmid C.. Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, June 2022, 2023; Computer Vision Foundation, 2023; pp 10714–10726.

Huang C.; Wu Z.; Wen J.; Xu Y.; Jiang Q.; Wang Y. Abnormal Event Detection Using Deep Contrastive Learning for Intelligent Video Surveillance System. IEEE Trans. Ind. Inf. 2022, 18 (8), 5171–5179. 10.1109/TII.2021.3122801. DOI

Ho J.; Chan W.; Saharia C.; Whang J.; Gao R.; Gritsenko A.; Kingma D. P.; Poole B.; Norouzi M.; Fleet D. J.; Salimans T. Imagen Video: High Definition Video Generation with Diffusion Models. arXiv [cs.CV] 2022, 10.48550/arXiv.2210.02303. DOI

Villegas R.; Babaeizadeh M.; Kindermans P.-J.; Moraldo H.; Zhang H.; Saffar M. T.; Castro S.; Kunze J.; Erhan D.. Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=vOEXS39nOF

Singer U.; Polyak A.; Hayes T.; Yin X.; An J.; Zhang S.; Hu Q.; Yang H.; Ashual O.; Gafni O.; Parikh D.; Gupta S.; Taigman Y.. Make-A-Video: Text-to-Video Generation without Text-Video Data. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=nJfylDvgzlq

Hung M.; Lauren E.; Hon E. S.; Birmingham W. C.; Xu J.; Su S.; Hon S. D.; Park J.; Dang P.; Lipsky M. S. Social Network Analysis of COVID-19 Sentiments: Application of Artificial Intelligence. J. Med. Internet Res. 2020, 22 (8), e2259010.2196/22590. PubMed DOI PMC

Bryant P.; Pozzati G.; Elofsson A. Improved Prediction of Protein-Protein Interactions Using AlphaFold2. Nat. Commun. 2022, 13, 1265.10.1038/s41467-022-28865-w. PubMed DOI PMC

Muzio G.; O’Bray L.; Borgwardt K. Biological Network Analysis with Deep Learning. Brief. Bioinform. 2021, 22 (2), 1515–1530. 10.1093/bib/bbaa257. PubMed DOI PMC

Chen J.; Zheng S.; Zhao H.; Yang Y. Structure-Aware Protein Solubility Prediction from Sequence through Graph Convolutional Network and Predicted Contact Map. J. Cheminform. 2021, 13, 7.10.1186/s13321-021-00488-1. PubMed DOI PMC

Jiang J.; Wang R.; Wei G.-W. GGL-Tox: Geometric Graph Learning for Toxicity Prediction. J. Chem. Inf. Model. 2021, 61 (4), 1691–1700. 10.1021/acs.jcim.0c01294. PubMed DOI PMC

Hu W.; Fey M.; Zitnik M.; Dong Y.; Ren H.; Liu B.; Catasta M.; Leskovec J.; Larochelle H.; Ranzato M.; Hadsell R.; Balcan M. F.; Lin H. Open Graph Benchmark: Datasets for Machine Learning on Graphs. Adv. Neural Inf. Process. Syst. 2020, 33, 22118–22133.

Kawashima S.; Pokarowski P.; Pokarowska M.; Kolinski A.; Katayama T.; Kanehisa M. AAindex: Amino Acid Index Database, Progress Report 2008. Nucleic Acids Res. 2007, 36, D202–D205. 10.1093/nar/gkm998. PubMed DOI PMC

ElAbd H.; Bromberg Y.; Hoarfrost A.; Lenz T.; Franke A.; Wendorff M. Amino Acid Encoding for Deep Learning Applications. BMC Bioinformatics 2020, 21, 235.10.1186/s12859-020-03546-x. PubMed DOI PMC

Raimondi D.; Orlando G.; Vranken W. F.; Moreau Y. Exploring the Limitations of Biophysical Propensity Scales Coupled with Machine Learning for Protein Sequence Analysis. Sci. Rep. 2019, 9, 16932.10.1038/s41598-019-53324-w. PubMed DOI PMC

Kandathil S. M.; Greener J. G.; Lau A. M.; Jones D. T. Ultrafast End-to-End Protein Structure Prediction Enables High-Throughput Exploration of Uncharacterized Proteins. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (4), e211334811910.1073/pnas.2113348119. PubMed DOI PMC

Fasoulis R.; Paliouras G.; Kavraki L. E. Graph Representation Learning for Structural Proteomics. Emerg Top Life Sci 2021, 5 (6), 789–802. 10.1042/ETLS20210225. PubMed DOI PMC

Hermosilla P.; Schäfer M.; Lang M.; Fackelmann G.; Vázquez P. P.; Kozlíková B.; Krone M.; Ritschel T.; Ropinski T.. Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures. Ninth International Conference on Learning Representations, May 3–7, 2021; OpenReview, 2021.

Batzner S.; Musaelian A.; Sun L.; Geiger M.; Mailoa J. P.; Kornbluth M.; Molinari N.; Smidt T. E.; Kozinsky B. E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials. Nat. Commun. 2022, 13, 2453.10.1038/s41467-022-29939-5. PubMed DOI PMC

Gligorijević V.; Renfrew P. D.; Kosciolek T.; Leman J. K.; Berenberg D.; Vatanen T.; Chandler C.; Taylor B. C.; Fisk I. M.; Vlamakis H.; Xavier R. J.; Knight R.; Cho K.; Bonneau R. Structure-Based Protein Function Prediction Using Graph Convolutional Networks. Nat. Commun. 2021, 12, 3168.10.1038/s41467-021-23303-9. PubMed DOI PMC

Gao Z.; Jiang C.; Zhang J.; Jiang X.; Li L.; Zhao P.; Yang H.; Huang Y.; Li J. Hierarchical Graph Learning for Protein-Protein Interaction. Nat. Commun. 2023, 14, 1093.10.1038/s41467-023-36736-1. PubMed DOI PMC

Vaswani A.; Shazeer N.; Parmar N.; Uszkoreit J.; Jones L.; Gomez A. N.; Kaiser Ł.; Polosukhin I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5999.

Fuchs F.; Worrall D.; Fischer V.; Welling M.; Larochelle H.; Ranzato M.; Hadsell R.; Balcan M. F.; Lin H. SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks. Adv. Neural Inf. Process. Syst. 2020, 33, 1970–1981.

Detlefsen N. S.; Hauberg S.; Boomsma W. Learning Meaningful Representations of Protein Sequences. Nat. Commun. 2022, 13, 1914.10.1038/s41467-022-29443-w. PubMed DOI PMC

Rives A.; Meier J.; Sercu T.; Goyal S.; Lin Z.; Liu J.; Guo D.; Ott M.; Zitnick C. L.; Ma J.; Fergus R. Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences. Proc. Natl. Acad. Sci. U. S. A. 2021, 118 (15), e201623911810.1073/pnas.2016239118. PubMed DOI PMC

Meier J.; Rao R.; Verkuil R.; Liu J.; Sercu T.; Rives A.; Ranzato M.; Beygelzimer A.; Dauphin Y.; Liang P. S.; Vaughan J. W. Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function. Adv. Neural Inf. Process. Syst. 2021, 34, 29287–29303.

Lin Z.; Akin H.; Rao R.; Hie B.; Zhu Z.; Lu W.; Smetanin N.; Verkuil R.; Kabeli O.; Shmueli Y.; Dos Santos Costa A.; Fazel-Zarandi M.; Sercu T.; Candido S.; Rives A. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379 (6637), 1123–1130. 10.1126/science.ade2574. PubMed DOI

Zhang Z.; Xu M.; Jamasb A.; Chenthamarakshan V.; Lozano A.; Das P.; Tang J.. Protein Representation Learning by Geometric Structure Pretraining. The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=to3qCB3tOh9

Fowler D. M.; Fields S. Deep Mutational Scanning: A New Style of Protein Science. Nat. Methods 2014, 11 (8), 801–807. 10.1038/nmeth.3027. PubMed DOI PMC

Vanella R.; Kovacevic G.; Doffini V.; Fernández de Santaella J.; Nash M. A. High-Throughput Screening, next Generation Sequencing and Machine Learning: Advanced Methods in Enzyme Engineering. Chem. Commun. 2022, 58 (15), 2455–2467. 10.1039/D1CC04635G. PubMed DOI PMC

Morrison K. L.; Weiss G. A. Combinatorial Alanine-Scanning. Curr. Opin. Chem. Biol. 2001, 5 (3), 302–307. 10.1016/S1367-5931(00)00206-4. PubMed DOI

Brown T.; Mann B.; Ryder N.; Subbiah M.; Kaplan J. D.; Dhariwal P.; Neelakantan A.; Shyam P.; Sastry G.; Askell A. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901.

OpenAI . GPT-4 Technical Report. arXiv (Computer Science.Computation and Language), March 27, 2023, 2303.08774. https://arxiv.org/abs/2303.08774.

Luo R.; Sun L.; Xia Y.; Qin T.; Zhang S.; Poon H.; Liu T.-Y. BioGPT: Generative Pre-Trained Transformer for Biomedical Text Generation and Mining. Brief. Bioinform. 2022, 23 (6), bbac40910.1093/bib/bbac409. PubMed DOI

Zhu Z.; Shi C.; Zhang Z.; Liu S.; Xu M.; Yuan X.; Zhang Y.; Chen J.; Cai H.; Lu J.; Ma C.; Liu R.; Xhonneux L.-P.; Qu M.; Tang J. TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery. arXiv [cs.LG] 2022, 10.48550/arXiv.2202.08320. DOI

Siedhoff N. E.; Illig A.-M.; Schwaneberg U.; Davari M. D. PyPEF-An Integrated Framework for Data-Driven Protein Engineering. J. Chem. Inf. Model. 2021, 61 (7), 3463–3476. 10.1021/acs.jcim.1c00099. PubMed DOI

Draizen E. J.; Murillo L. F. R.; Readey J.; Mura C.; Bourne P. E.. Prop3D: A Flexible, Python-Based Platform for Machine Learning with Protein Structural Properties and Biophysical Data. bioRxiv, 2022, 2022.12.27.522071. 10.1101/2022.12.27.522071. PubMed DOI PMC

Berman H. M.; Westbrook J.; Feng Z.; Gilliland G.; Bhat T. N.; Weissig H.; Shindyalov I. N.; Bourne P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235–242. 10.1093/nar/28.1.235. PubMed DOI PMC

Chothia C.; Lesk A. M. The Relation between the Divergence of Sequence and Structure in Proteins. EMBO J. 1986, 5 (4), 823–826. 10.1002/j.1460-2075.1986.tb04288.x. PubMed DOI PMC

van Kempen M.; Kim S. S.; Tumescheit C.; Mirdita M.; Lee J.; Gilchrist C. L. M.; Söding J.; Steinegger M. Fast and Accurate Protein Structure Search with Foldseek. Nat. Biotechnol. 2023, 10.1038/s41587-023-01773-0. PubMed DOI PMC

Brookes D.; Park H.; Listgarten J.. Conditioning by Adaptive Sampling for Robust Design. In Proceedings of the 36th International Conference on Machine Learning; Chaudhuri K., Salakhutdinov R., Eds.; Proceedings of Machine Learning Research, Vol. 97; PMLR, 2019; pp 773–782.

Sinai S.; Wang R.; Whatley A.; Slocum S.; Locane E.; Kelsic E. D.. AdaLead: A Simple and Robust Adaptive Greedy Search Algorithm for Sequence Design. arXiv (Computer Science.Machine Learning), October 5, 2020, 2010.02141, ver. 1.10.48550/arXiv.2010.02141 DOI

Ren Z.; Li J.; Ding F.; Zhou Y.; Ma J.; Peng J.. Proximal Exploration for Model-Guided Protein Sequence Design. In Proceedings of the 39th International Conference on Machine Learning; Chaudhuri K., Jegelka S., Song L., Szepesvari C., Niu G., Sabato S., Eds.; Proceedings of Machine Learning Research, Vol. 162; PMLR, 2022; pp 18520–18536.

Lipsh-Sokolik R.; Khersonsky O.; Schröder S. P.; de Boer C.; Hoch S.-Y.; Davies G. J.; Overkleeft H. S.; Fleishman S. J. Combinatorial Assembly and Design of Enzymes. Science 2023, 379 (6628), 195–201. 10.1126/science.ade9434. PubMed DOI

Yu T.; Boob A. G.; Volk M. J.; Liu X.; Cui H.; Zhao H. Machine Learning-Enabled Retrobiosynthesis of Molecules. Nat. Catal. 2023, 6, 137.10.1038/s41929-022-00909-w. DOI

Mistry J.; Chuguransky S.; Williams L.; Qureshi M.; Salazar G. A.; Sonnhammer E. L. L.; Tosatto S. C. E.; Paladin L.; Raj S.; Richardson L. J.; Finn R. D.; Bateman A. Pfam: The Protein Families Database in 2021. Nucleic Acids Res. 2021, 49 (D1), D412–D419. 10.1093/nar/gkaa913. PubMed DOI PMC

Pandurangan A. P.; Stahlhacke J.; Oates M. E.; Smithers B.; Gough J. The SUPERFAMILY 2.0 Database: A Significant Proteome Update and a New Webserver. Nucleic Acids Res. 2019, 47 (D1), D490–D494. 10.1093/nar/gky1130. PubMed DOI PMC

Sillitoe I.; Bordin N.; Dawson N.; Waman V. P.; Ashford P.; Scholes H. M.; Pang C. S. M.; Woodridge L.; Rauer C.; Sen N.; Abbasian M.; Le Cornu S.; Lam S. D.; Berka K.; Varekova I. H.; Svobodova R.; Lees J.; Orengo C. A. CATH: Increased Structural Coverage of Functional Space. Nucleic Acids Res. 2021, 49 (D1), D266–D273. 10.1093/nar/gkaa1079. PubMed DOI PMC

Alcántara R.; Axelsen K. B.; Morgat A.; Belda E.; Coudert E.; Bridge A.; Cao H.; de Matos P.; Ennis M.; Turner S.; Owen G.; Bougueleret L.; Xenarios I.; Steinbeck C. Rhea--a Manually Curated Resource of Biochemical Reactions. Nucleic Acids Res. 2012, 40, D754–D760. 10.1093/nar/gkr1126. PubMed DOI PMC

Schomburg I.; Chang A.; Schomburg D. BRENDA, Enzyme Data and Metabolic Information. Nucleic Acids Res. 2002, 30 (1), 47–49. 10.1093/nar/30.1.47. PubMed DOI PMC

Wittig U.; Rey M.; Weidemann A.; Kania R.; Müller W. SABIO-RK: An Updated Resource for Manually Curated Biochemical Reaction Kinetics. Nucleic Acids Res. 2018, 46 (D1), D656–D660. 10.1093/nar/gkx1065. PubMed DOI PMC

Wishart D. S.; Li C.; Marcu A.; Badran H.; Pon A.; Budinski Z.; Patron J.; Lipton D.; Cao X.; Oler E.; Li K.; Paccoud M.; Hong C.; Guo A. C.; Chan C.; Wei W.; Ramirez-Gaona M. PathBank: A Comprehensive Pathway Database for Model Organisms. Nucleic Acids Res. 2020, 48 (D1), D470–D478. 10.1093/nar/gkz861. PubMed DOI PMC

Hafner J.; MohammadiPeyhani H.; Sveshnikova A.; Scheidegger A.; Hatzimanikatis V. Updated ATLAS of Biochemistry with New Metabolites and Improved Enzyme Prediction Power. ACS Synth. Biol. 2020, 9 (6), 1479–1482. 10.1021/acssynbio.0c00052. PubMed DOI PMC

Ganter M.; Bernard T.; Moretti S.; Stelling J.; Pagni M. Metanetx.org: A Website and Repository for Accessing, Analysing and Manipulating Metabolic Networks. Bioinformatics 2013, 29 (6), 815–816. 10.1093/bioinformatics/btt036. PubMed DOI PMC

Bairoch A. The ENZYME Database in 2000. Nucleic Acids Res. 2000, 28 (1), 304–305. 10.1093/nar/28.1.304. PubMed DOI PMC

McDonald A. G.; Tipton K. F. Enzyme Nomenclature and Classification: The State of the Art. FEBS J. 2023, 290 (9), 2214–2231. 10.1111/febs.16274. PubMed DOI

Probst D.; Manica M.; Nana Teukam Y. G.; Castrogiovanni A.; Paratore F.; Laino T. Biocatalysed Synthesis Planning Using Data-Driven Learning. Nat. Commun. 2022, 13, 964.10.1038/s41467-022-28536-w. PubMed DOI PMC

Heid E.; Probst D.; Green W. H.; Madsen G. K. H. EnzymeMap: Curation, Validation and Data-Driven Prediction of Enzymatic Reactions. ChemRxiv 2023, 10.26434/chemrxiv-2023-jzw9w. PubMed DOI PMC

Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596 (7873), 583–589. 10.1038/s41586-021-03819-2. PubMed DOI PMC

Bileschi M. L.; Belanger D.; Bryant D. H.; Sanderson T.; Carter B.; Sculley D.; Bateman A.; DePristo M. A.; Colwell L. J. Using Deep Learning to Annotate the Protein Universe. Nat. Biotechnol. 2022, 40 (6), 932–937. 10.1038/s41587-021-01179-w. PubMed DOI

Nallapareddy V.; Bordin N.; Sillitoe I.; Heinzinger M.; Littmann M.; Waman V. P.; Sen N.; Rost B.; Orengo C. CATHe: Detection of Remote Homologues for CATH Superfamilies Using Embeddings from Protein Language Models. Bioinformatics 2023, 39, btad02910.1093/bioinformatics/btad029. PubMed DOI PMC

Jiang S.-Y.; Jin J.; Sarojam R.; Ramachandran S. A Comprehensive Survey on the Terpene Synthase Gene Family Provides New Insight into Its Evolutionary Patterns. Genome Biol. Evol. 2019, 11 (8), 2078–2098. 10.1093/gbe/evz142. PubMed DOI PMC

Claudel-Renard C.; Chevalet C.; Faraut T.; Kahn D. Enzyme-Specific Profiles for Genome Annotation: PRIAM. Nucleic Acids Res. 2003, 31 (22), 6633–6639. 10.1093/nar/gkg847. PubMed DOI PMC

Shen H.-B.; Chou K.-C. EzyPred: A Top–down Approach for Predicting Enzyme Functional Classes and Subclasses. Biochem. Biophys. Res. Commun. 2007, 364 (1), 53–59. 10.1016/j.bbrc.2007.09.098. PubMed DOI

Dalkiran A.; Rifaioglu A. S.; Martin M. J.; Cetin-Atalay R.; Atalay V.; Doğan T. ECPred: A Tool for the Prediction of the Enzymatic Functions of Protein Sequences Based on the EC Nomenclature. BMC Bioinformatics 2018, 19, 334.10.1186/s12859-018-2368-y. PubMed DOI PMC

Huang W.-L.; Chen H.-M.; Hwang S.-F.; Ho S.-Y. Accurate Prediction of Enzyme Subfamily Class Using an Adaptive Fuzzy K-Nearest Neighbor Method. Biosystems. 2007, 90 (2), 405–413. 10.1016/j.biosystems.2006.10.004. PubMed DOI

Nasibov E.; Kandemir-Cavas C. Efficiency Analysis of KNN and Minimum Distance-Based Classifiers in Enzyme Family Prediction. Comput. Biol. Chem. 2009, 33 (6), 461–464. 10.1016/j.compbiolchem.2009.09.002. PubMed DOI

De Ferrari L.; Aitken S.; van Hemert J.; Goryanin I. EnzML: Multi-Label Prediction of Enzyme Classes Using InterPro Signatures. BMC Bioinformatics 2012, 13, 61.10.1186/1471-2105-13-61. PubMed DOI PMC

Dobson P. D.; Doig A. J. Predicting Enzyme Class from Protein Structure without Alignments. J. Mol. Biol. 2005, 345 (1), 187–199. 10.1016/j.jmb.2004.10.024. PubMed DOI

Kumar N.; Skolnick J. EFICAz2.5: Application of a High-Precision Enzyme Function Predictor to 396 Proteomes. Bioinformatics 2012, 28 (20), 2687–2688. 10.1093/bioinformatics/bts510. PubMed DOI PMC

Matsuta Y.; Ito M.; Tohsato Y. ECOH: An Enzyme Commission Number Predictor Using Mutual Information and a Support Vector Machine. Bioinformatics 2013, 29 (3), 365–372. 10.1093/bioinformatics/bts700. PubMed DOI

Li Y. H.; Xu J. Y.; Tao L.; Li X. F.; Li S.; Zeng X.; Chen S. Y.; Zhang P.; Qin C.; Zhang C.; Chen Z.; Zhu F.; Chen Y. Z. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity. PLoS One 2016, 11 (8), e015529010.1371/journal.pone.0155290. PubMed DOI PMC

Nagao C.; Nagano N.; Mizuguchi K. Prediction of Detailed Enzyme Functions and Identification of Specificity Determining Residues by Random Forests. PLoS One 2014, 9 (1), e8462310.1371/journal.pone.0084623. PubMed DOI PMC

Kumar C.; Choudhary A. A Top-down Approach to Classify Enzyme Functional Classes and Sub-Classes Using Random Forest. EURASIP J. Bioinform. Syst. Biol. 2012, 2012, 1.10.1186/1687-4153-2012-1. PubMed DOI PMC

Volpato V.; Adelfio A.; Pollastri G. Accurate Prediction of Protein Enzymatic Class by N-to-1 Neural Networks. BMC Bioinformatics 2013, 14, S11.10.1186/1471-2105-14-S1-S11. PubMed DOI PMC

Amidi A.; Amidi S.; Vlachakis D.; Megalooikonomou V.; Paragios N.; Zacharaki E. I. EnzyNet: Enzyme Classification Using 3D Convolutional Neural Networks on Spatial Representation. PeerJ 2018, 6, e475010.7717/peerj.4750. PubMed DOI PMC

Ryu J. Y.; Kim H. U.; Lee S. Y. Deep Learning Enables High-Quality and High-Throughput Prediction of Enzyme Commission Numbers. Proc. Natl. Acad. Sci. U. S. A. 2019, 116 (28), 13996–14001. 10.1073/pnas.1821905116. PubMed DOI PMC

Sanderson T.; Bileschi M. L.; Belanger D.; Colwell L. J. ProteInfer, Deep Neural Networks for Protein Functional Inference. Elife 2023, 12, e8094210.7554/eLife.80942. PubMed DOI PMC

Yu T.; Cui H.; Li J. C.; Luo Y.; Jiang G.; Zhao H. Enzyme Function Prediction Using Contrastive Learning. Science 2023, 379 (6639), 1358–1363. 10.1126/science.adf2465. PubMed DOI

Levin I.; Liu M.; Voigt C. A.; Coley C. W. Merging Enzymatic and Synthetic Chemistry with Computational Synthesis Planning. Nat. Commun. 2022, 13, 7747.10.1038/s41467-022-35422-y. PubMed DOI PMC

Zheng S.; Zeng T.; Li C.; Chen B.; Coley C. W.; Yang Y.; Wu R. Deep Learning Driven Biosynthetic Pathways Navigation for Natural Products with BioNavi-NP. Nat. Commun. 2022, 13, 3342.10.1038/s41467-022-30970-9. PubMed DOI PMC

Watanabe N.; Yamamoto M.; Murata M.; Vavricka C. J.; Ogino C.; Kondo A.; Araki M. Comprehensive Machine Learning Prediction of Extensive Enzymatic Reactions. J. Phys. Chem. B 2022, 126 (36), 6762–6770. 10.1021/acs.jpcb.2c03287. PubMed DOI

Kroll A.; Ranjan S.; Engqvist M. K. M.; Lercher M. J. A General Model to Predict Small Molecule Substrates of Enzymes Based on Machine and Deep Learning. Nat. Commun. 2023, 14, 2787.10.1038/s41467-023-38347-2. PubMed DOI PMC

Goldman S.; Das R.; Yang K. K.; Coley C. W. Machine Learning Modeling of Family Wide Enzyme-Substrate Specificity Screens. PLoS Comput. Biol. 2022, 18 (2), e100985310.1371/journal.pcbi.1009853. PubMed DOI PMC

Berman H. M.; Gabanyi M. J.; Kouranov A.; Micallef D. I.; Westbrook J.; Protein Structure Initiative network of investigators . Protein Structure Initiative - TargetTrack 2000-2017 - all data files. Zenodo, 2017. 10.5281/zenodo.821654 DOI

Jarzab A.; Kurzawa N.; Hopf T.; Moerch M.; Zecha J.; Leijten N.; Bian Y.; Musiol E.; Maschberger M.; Stoehr G.; Becher I.; Daly C.; Samaras P.; Mergner J.; Spanier B.; Angelov A.; Werner T.; Bantscheff M.; Wilhelm M.; Klingenspor M.; Lemeer S.; Liebl W.; Hahne H.; Savitski M. M.; Kuster B. Meltome Atlas-Thermal Proteome Stability across the Tree of Life. Nat. Methods 2020, 17 (5), 495–503. 10.1038/s41592-020-0801-4. PubMed DOI

Yang Y.; Zhao J.; Zeng L.; Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int. J. Mol. Sci. 2022, 23 (18), 10798.10.3390/ijms231810798. PubMed DOI PMC

Sapoval N.; Aghazadeh A.; Nute M. G.; Antunes D. A.; Balaji A.; Baraniuk R.; Barberan C. J.; Dannenfelser R.; Dun C.; Edrisi M.; Elworth R. A. L.; Kille B.; Kyrillidis A.; Nakhleh L.; Wolfe C. R.; Yan Z.; Yao V.; Treangen T. J. Current Progress and Open Challenges for Applying Deep Learning across the Biosciences. Nat. Commun. 2022, 13, 1728.10.1038/s41467-022-29268-7. PubMed DOI PMC

Diaz D. J.; Kulikova A. V.; Ellington A. D.; Wilke C. O. Using Machine Learning to Predict the Effects and Consequences of Mutations in Proteins. Curr. Opin. Struct. Biol. 2023, 78, 10251810.1016/j.sbi.2022.102518. PubMed DOI PMC

Thumuluri V.; Martiny H.-M.; Almagro Armenteros J. J.; Salomon J.; Nielsen H.; Johansen A. R. NetSolP: Predicting Protein Solubility in Escherichia Coli Using Language Models. Bioinformatics 2022, 38 (4), 941–946. 10.1093/bioinformatics/btab801. PubMed DOI

Caldararu O.; Mehra R.; Blundell T. L.; Kepp K. P. Systematic Investigation of the Data Set Dependency of Protein Stability Predictors. J. Chem. Inf. Model. 2020, 60 (10), 4772–4784. 10.1021/acs.jcim.0c00591. PubMed DOI

Mazurenko S. Predicting Protein Stability and Solubility Changes upon Mutations: Data Perspective. ChemCatChem 2020, 12 (22), 5590–5598. 10.1002/cctc.202000933. DOI

Velecký J.; Hamsikova M.; Stourac J.; Musil M.; Damborsky J.; Bednar D.; Mazurenko S. SoluProtMutDB: A Manually Curated Database of Protein Solubility Changes upon Mutations. Comput. Struct. Biotechnol. J. 2022, 20, 6339–6347. 10.1016/j.csbj.2022.11.009. PubMed DOI PMC

Wang S.; Tang H.; Zhao Y.; Zuo L. BayeStab: Predicting Effects of Mutations on Protein Stability with Uncertainty Quantification. Protein Sci. 2022, 31 (11), e446710.1002/pro.4467. PubMed DOI PMC

Nikam R.; Kulandaisamy A.; Harini K.; Sharma D.; Gromiha M. M. ProThermDB: Thermodynamic Database for Proteins and Mutants Revisited after 15 Years. Nucleic Acids Res. 2021, 49 (D1), D420–D424. 10.1093/nar/gkaa1035. PubMed DOI PMC

Iqbal S.; Ge F.; Li F.; Akutsu T.; Zheng Y.; Gasser R. B.; Yu D.-J.; Webb G. I.; Song J. PROST: AlphaFold2-Aware Sequence-Based Predictor to Estimate Protein Stability Changes upon Missense Mutations. J. Chem. Inf. Model. 2022, 62 (17), 4270–4282. 10.1021/acs.jcim.2c00799. PubMed DOI

Hernández I. M.; Dehouck Y.; Bastolla U.; López-Blanco J. R.; Chacón P. Predicting Protein Stability Changes upon Mutation Using a Simple Orientational Potential. Bioinformatics 2023, 39 (1), btad01110.1093/bioinformatics/btad011. PubMed DOI PMC

Xavier J. S.; Nguyen T.-B.; Karmarkar M.; Portelli S.; Rezende P. M.; Velloso J. P. L.; Ascher D. B.; Pires D. E. V. ThermoMutDB: A Thermodynamic Database for Missense Mutations. Nucleic Acids Res. 2021, 49 (D1), D475–D479. 10.1093/nar/gkaa925. PubMed DOI PMC

Pak M. A.; Markhieva K. A.; Novikova M. S.; Petrov D. S.; Vorobyev I. S.; Maksimova E. S.; Kondrashov F. A.; Ivankov D. N. Using AlphaFold to Predict the Impact of Single Mutations on Protein Stability and Function. PLoS One 2023, 18 (3), e028268910.1371/journal.pone.0282689. PubMed DOI PMC

Tsuboyama K.; Dauparas J.; Chen J.; Laine E.; Mohseni Behbahani Y.; Weinstein J. J.; Mangan N. M.; Ovchinnikov S.; Rocklin G. J. Mega-Scale Experimental Analysis of Protein Folding Stability in Biology and Design. Nature 2023, 620 (7973), 434–444. 10.1038/s41586-023-06328-6. PubMed DOI PMC

Yang Y.; Zeng L.; Vihinen M. PON-Sol2: Prediction of Effects of Variants on Protein Solubility. Int. J. Mol. Sci. 2021, 22 (15), 8027.10.3390/ijms22158027. PubMed DOI PMC

Li F.; Yuan L.; Lu H.; Li G.; Chen Y.; Engqvist M. K. M.; Kerkhoven E. J.; Nielsen J. Deep Learning-Based Kcat Prediction Enables Improved Enzyme-Constrained Model Reconstruction. Nature Catalysis 2022, 5 (8), 662–672. 10.1038/s41929-022-00798-z. DOI

Xie W. J.; Asadi M.; Warshel A. Enhancing Computational Enzyme Design by a Maximum Entropy Strategy. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (7), e212235511910.1073/pnas.2122355119. PubMed DOI PMC

Ostafe R.; Fontaine N.; Frank D.; Ng Fuk Chong M.; Prodanovic R.; Pandjaitan R.; Offmann B.; Cadet F.; Fischer R. One-Shot Optimization of Multiple Enzyme Parameters: Tailoring Glucose Oxidase for PH and Electron Mediators. Biotechnol. Bioeng. 2020, 117 (1), 17–29. 10.1002/bit.27169. PubMed DOI

Høie M. H.; Cagiada M.; Beck Frederiksen A. H.; Stein A.; Lindorff-Larsen K. Predicting and Interpreting Large-Scale Mutagenesis Data Using Analyses of Protein Stability and Conservation. Cell Rep. 2022, 38 (2), 11020710.1016/j.celrep.2021.110207. PubMed DOI

Cendrowska J. PRISM: An Algorithm for Inducing Modular Rules. Int. J. Man. Mach. Stud. 1987, 27 (4), 349–370. 10.1016/S0020-7373(87)80003-2. DOI

Gupta A.; Agrawal S. Machine Learning-Based Enzyme Engineering of PETase for Improved Efficiency in Degrading Non-Biodegradable Plastic. bioRxiv 2022, 10.1101/2022.01.11.475766. DOI

Gado J. E.; Beckham G. T.; Payne C. M. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning. J. Chem. Inf. Model. 2020, 60 (8), 4098–4107. 10.1021/acs.jcim.0c00489. PubMed DOI

Voutilainen S.; Heinonen M.; Andberg M.; Jokinen E.; Maaheimo H.; Pääkkönen J.; Hakulinen N.; Rouvinen J.; Lähdesmäki H.; Kaski S.; Rousu J.; Penttilä M.; Koivula A. Substrate Specificity of 2-Deoxy-D-Ribose 5-Phosphate Aldolase (DERA) Assessed by Different Protein Engineering and Machine Learning Methods. Appl. Microbiol. Biotechnol. 2020, 104 (24), 10515–10529. 10.1007/s00253-020-10960-x. PubMed DOI PMC

Prabakaran R.; Rawat P.; Kumar S.; Michael Gromiha M. ANuPP: A Versatile Tool to Predict Aggregation Nucleating Regions in Peptides and Proteins. J. Mol. Biol. 2021, 433 (11), 16670710.1016/j.jmb.2020.11.006. PubMed DOI

Thangakani A. M.; Nagarajan R.; Kumar S.; Sakthivel R.; Velmurugan D.; Gromiha M. M. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation. PLoS One 2016, 11 (4), e015294910.1371/journal.pone.0152949. PubMed DOI PMC

Rawat P.; Prabakaran R.; Sakthivel R.; Mary Thangakani A.; Kumar S.; Gromiha M. M. CPAD 2.0: A Repository of Curated Experimental Data on Aggregating Proteins and Peptides. Amyloid 2020, 27 (2), 128–133. 10.1080/13506129.2020.1715363. PubMed DOI

Beerten J.; Van Durme J.; Gallardo R.; Capriotti E.; Serpell L.; Rousseau F.; Schymkowitz J. WALTZ-DB: A Benchmark Database of Amyloidogenic Hexapeptides. Bioinformatics 2015, 31 (10), 1698–1700. 10.1093/bioinformatics/btv027. PubMed DOI

Louros N.; Konstantoulea K.; De Vleeschouwer M.; Ramakers M.; Schymkowitz J.; Rousseau F. WALTZ-DB 2.0: An Updated Database Containing Structural Information of Experimentally Determined Amyloid-Forming Peptides. Nucleic Acids Res. 2020, 48 (D1), D389–D393. 10.1093/nar/gkz758. PubMed DOI PMC

Wozniak P. P.; Kotulska M. AmyLoad: Website Dedicated to Amyloidogenic Protein Fragments. Bioinformatics 2015, 31 (20), 3395–3397. 10.1093/bioinformatics/btv375. PubMed DOI

Liu X.; Luo Y.; Li P.; Song S.; Peng J. Deep Geometric Representations for Modeling Effects of Mutations on Protein-Protein Binding Affinity. PLoS Comput. Biol. 2021, 17 (8), e100928410.1371/journal.pcbi.1009284. PubMed DOI PMC

Jankauskaite J.; Jiménez-García B.; Dapkunas J.; Fernández-Recio J.; Moal I. H. SKEMPI 2.0: An Updated Benchmark of Changes in Protein-Protein Binding Energy, Kinetics and Thermodynamics upon Mutation. Bioinformatics 2019, 35 (3), 462–469. 10.1093/bioinformatics/bty635. PubMed DOI PMC

Stourac J.; Dubrava J.; Musil M.; Horackova J.; Damborsky J.; Mazurenko S.; Bednar D. FireProtDB: Database of Manually Curated Protein Stability Data. Nucleic Acids Res. 2021, 49 (D1), D319–D324. 10.1093/nar/gkaa981. PubMed DOI PMC

Pancotti C.; Benevenuta S.; Birolo G.; Alberini V.; Repetto V.; Sanavia T.; Capriotti E.; Fariselli P. Predicting Protein Stability Changes upon Single-Point Mutation: A Thorough Comparison of the Available Tools on a New Dataset. Brief. Bioinform. 2022, 23 (2), bbab55510.1093/bib/bbab555. PubMed DOI PMC

Livesey B. J.; Marsh J. A. Updated Benchmarking of Variant Effect Predictors Using Deep Mutational Scanning. bioRxiv 2022, 10.1101/2022.11.19.517196. PubMed DOI PMC

Dunham A. S.; Beltrao P. Exploring Amino Acid Functions in a Deep Mutational Landscape. Mol. Syst. Biol. 2021, 17 (7), e1030510.15252/msb.202110305. PubMed DOI PMC

Reeb J.; Wirth T.; Rost B. Variant Effect Predictions Capture Some Aspects of Deep Mutational Scanning Experiments. BMC Bioinformatics 2020, 21, 107.10.1186/s12859-020-3439-4. PubMed DOI PMC

Gray V. E.; Hause R. J.; Luebeck J.; Shendure J.; Fowler D. M. Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. Cell Syst 2018, 6 (1), 116–124.e3. 10.1016/j.cels.2017.11.003. PubMed DOI PMC

Notin P.; Dias M.; Frazer J.; Hurtado J. M.; Gomez A. N.; Marks D.; Gal Y.. Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-Time Retrieval. In Proceedings of the 39th International Conference on Machine Learning; Chaudhuri K., Jegelka S., Song L., Szepesvari C., Niu G., Sabato S., Eds.; Proceedings of Machine Learning Research, Vol. 162; PMLR, 2022; pp 16990–17017.

Markin C. J.; Mokhtari D. A.; Sunden F.; Appel M. J.; Akiva E.; Longwell S. A.; Sabatti C.; Herschlag D.; Fordyce P. M. Revealing Enzyme Functional Architecture via High-Throughput Microfluidic Enzyme Kinetics. Science 2021, 373 (6553), eabf8761.10.1126/science.abf8761. PubMed DOI PMC

Thompson S.; Zhang Y.; Ingle C.; Reynolds K. A.; Kortemme T. Altered Expression of a Quality Control Protease in E. Coli Reshapes the in Vivo Mutational Landscape of a Model Enzyme. Elife 2020, 9, e5347610.7554/eLife.53476. PubMed DOI PMC

Nikoomanzar A.; Vallejo D.; Chaput J. C. Elucidating the Determinants of Polymerase Specificity by Microfluidic-Based Deep Mutational Scanning. ACS Synth. Biol. 2019, 8 (6), 1421–1429. 10.1021/acssynbio.9b00104. PubMed DOI

Mighell T. L.; Thacker S.; Fombonne E.; Eng C.; O’Roak B. J. An Integrated Deep-Mutational-Scanning Approach Provides Clinical Insights on PTEN Genotype-Phenotype Relationships. Am. J. Hum. Genet. 2020, 106 (6), 818–829. 10.1016/j.ajhg.2020.04.014. PubMed DOI PMC

Wang X.; Zhang X.; Peng C.; Shi Y.; Li H.; Xu Z.; Zhu W. D3DistalMutation: A Database to Explore the Effect of Distal Mutations on Enzyme Activity. J. Chem. Inf. Model. 2021, 61 (5), 2499–2508. 10.1021/acs.jcim.1c00318. PubMed DOI

Ma E. J.; Siirola E.; Moore C.; Kummer A.; Stoeckli M.; Faller M.; Bouquet C.; Eggimann F.; Ligibel M.; Huynh D.; Cutler G.; Siegrist L.; Lewis R. A.; Acker A.-C.; Freund E.; Koch E.; Vogel M.; Schlingensiepen H.; Oakeley E. J.; Snajdrova R. Machine-Directed Evolution of an Imine Reductase for Activity and Stereoselectivity. ACS Catal. 2021, 11 (20), 12433–12445. 10.1021/acscatal.1c02786. DOI

Wu Z.; Kan S. B. J.; Lewis R. D.; Wittmann B. J.; Arnold F. H. Machine Learning-Assisted Directed Protein Evolution with Combinatorial Libraries. Proc. Natl. Acad. Sci. U. S. A. 2019, 116 (18), 8852–8858. 10.1073/pnas.1901979116. PubMed DOI PMC

Li G.; Qin Y.; Fontaine N. T.; Ng Fuk Chong M.; Maria-Solano M. A.; Feixas F.; Cadet X. F.; Pandjaitan R.; Garcia-Borràs M.; Cadet F.; Reetz M. T. Machine Learning Enables Selection of Epistatic Enzyme Mutants for Stability Against Unfolding and Detrimental Aggregation. Chembiochem 2021, 22 (5), 904–914. 10.1002/cbic.202000612. PubMed DOI PMC

Sarkar A.; Yang Y.; Vihinen M. Variation Benchmark Datasets: Update, Criteria, Quality and Applications. Database 2020, 2020, baz11710.1093/database/baz117. PubMed DOI PMC

Miton C. M.; Tokuriki N. How Mutational Epistasis Impairs Predictability in Protein Evolution and Design. Protein Sci. 2016, 25 (7), 1260–1272. 10.1002/pro.2876. PubMed DOI PMC

Wittmund M.; Cadet F.; Davari M. D. Learning Epistasis and Residue Coevolution Patterns: Current Trends and Future Perspectives for Advancing Enzyme Engineering. ACS Catal. 2022, 12 (22), 14243–14263. 10.1021/acscatal.2c01426. DOI

Yu H.; Ma S.; Li Y.; Dalby P. A. Hot Spots-Making Directed Evolution Easier. Biotechnol. Adv. 2022, 56, 10792610.1016/j.biotechadv.2022.107926. PubMed DOI

Sumbalova L.; Stourac J.; Martinek T.; Bednar D.; Damborsky J. HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries Based on Sequence Input Information. Nucleic Acids Res. 2018, 46 (W1), W356–W362. 10.1093/nar/gky417. PubMed DOI PMC

Khersonsky O.; Lipsh R.; Avizemer Z.; Ashani Y.; Goldsmith M.; Leader H.; Dym O.; Rogotner S.; Trudeau D. L.; Prilusky J.; Amengual-Rigo P.; Guallar V.; Tawfik D. S.; Fleishman S. J. Automated Design of Efficient and Functionally Diverse Enzyme Repertoires. Mol. Cell 2018, 72 (1), 178–186.e5. 10.1016/j.molcel.2018.08.033. PubMed DOI PMC

Clifton B. E.; Kozome D.; Laurino P. Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design. Biochemistry 2023, 62 (2), 210–220. 10.1021/acs.biochem.1c00757. PubMed DOI

Hie B. L.; Shanker V. R.; Xu D.; Bruun T. U. J.; Weidenbacher P. A.; Tang S.; Wu W.; Pak J. E.; Kim P. S. Efficient Evolution of Human Antibodies from General Protein Language Models. Nat. Biotechnol. 2023, 10.1038/s41587-023-01763-2. PubMed DOI PMC

Goudy O. J.; Nallathambi A.; Kinjo T.; Randolph N.; Kuhlman B. In Silico Evolution of Protein Binders with Deep Learning Models for Structure Prediction and Sequence Design. bioRxiv 2023, 10.1101/2023.05.03.539278. DOI

Linder J.; Bogard N.; Rosenberg A. B.; Seelig G. A Generative Neural Network for Maximizing Fitness and Diversity of Synthetic DNA and Protein Sequences. Cell Syst 2020, 11 (1), 49–62.e16. 10.1016/j.cels.2020.05.007. PubMed DOI PMC

Szegedy C.; Zaremba W.; Sutskever I.; Bruna J.; Erhan D.; Goodfellow I.; Fergus R. Intriguing Properties of Neural Networks. arXiv [cs.CV] 2013, 10.48550/arXiv.1312.6199. DOI

Yu T.; Boob A. G.; Singh N.; Su Y.; Zhao H. In Vitro Continuous Protein Evolution Empowered by Machine Learning and Automation. Cell Syst 2023, 14, 633.10.1016/j.cels.2023.04.006. PubMed DOI

Yang K. K.; Wu Z.; Arnold F. H. Machine-Learning-Guided Directed Evolution for Protein Engineering. Nat. Methods 2019, 16 (8), 687–694. 10.1038/s41592-019-0496-6. PubMed DOI

Wittmann B. J.; Yue Y.; Arnold F. H. Informed Training Set Design Enables Efficient Machine Learning-Assisted Directed Protein Evolution. Cell Syst 2021, 12 (11), 1026–1045.e7. 10.1016/j.cels.2021.07.008. PubMed DOI

Hie B.; Bryson B. D.; Berger B. Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design. Cell Syst 2020, 11 (5), 461–477.e9. 10.1016/j.cels.2020.09.007. PubMed DOI

Jain M.; Deleu T.; Hartford J.; Liu C.-H.; Hernandez-Garcia A.; Bengio Y. GFlowNets for AI-Driven Scientific Discovery. arXiv [cs.LG] 2023, 10.48550/arXiv.2302.00615. DOI

Bengio E.; Jain M.; Korablyov M.; Precup D.; Bengio Y. Flow Network Based Generative Models for Non-Iterative Diverse Candidate Generation. Adv. Neural Inf. Process. Syst. 2021, 34, 27381–27394.

Qiu Y.; Wei G.-W. CLADE 2.0: Evolution-Driven Cluster Learning-Assisted Directed Evolution. J. Chem. Inf. Model. 2022, 62 (19), 4629–4641. 10.1021/acs.jcim.2c01046. PubMed DOI

Alley E. C.; Khimulya G.; Biswas S.; AlQuraishi M.; Church G. M. Unified Rational Protein Engineering with Sequence-Based Deep Representation Learning. Nat. Methods 2019, 16 (12), 1315–1322. 10.1038/s41592-019-0598-1. PubMed DOI PMC

Biswas S.; Khimulya G.; Alley E. C.; Esvelt K. M.; Church G. M. Low-N Protein Engineering with Data-Efficient Deep Learning. Nat. Methods 2021, 18 (4), 389–396. 10.1038/s41592-021-01100-y. PubMed DOI

Hsu C.; Nisonoff H.; Fannjiang C.; Listgarten J. Learning Protein Fitness Models from Evolutionary and Assay-Labeled Data. Nat. Biotechnol. 2022, 40 (7), 1114–1122. 10.1038/s41587-021-01146-5. PubMed DOI

Zheng Z.; Deng Y.; Xue D.; Zhou Y.; Fei Y. E.; Gu Q. Structure-Informed Language Models Are Protein Designers. arXiv [cs.LG] 2023, 10.48550/arXiv.2302.01649. DOI

Radford A.; Wu J.; Child R.; Luan D.; Amodei D.; Sutskever I.. Language Models are Unsupervised Multitask Learners. Life-extension, 2020. https://life-extension.github.io/2020/05/27/GPT%E6%8A%80%E6%9C%AF%E5%88%9D%E6%8E%A2/language-models.pdf (accessed 2023-06-08).

Harris Z. S. Distributional Structure. Word World 1954, 10 (2–3), 146–162. 10.1080/00437956.1954.11659520. DOI

Elnaggar A.; Heinzinger M.; Dallago C.; Rehawi G.; Wang Y.; Jones L.; Gibbs T.; Feher T.; Angerer C.; Steinegger M.; Bhowmik D.; Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44 (10), 7112–7127. 10.1109/TPAMI.2021.3095381. PubMed DOI

Clifford J. N.; Høie M. H.; Deleuran S.; Peters B.; Nielsen M.; Marcatili P. BepiPred-3.0: Improved B-Cell Epitope Prediction Using Protein Language Models. Protein Sci. 2022, 31 (12), e449710.1002/pro.4497. PubMed DOI PMC

Elnaggar A.; Essam H.; Salah-Eldin W.; Moustafa W.; Elkerdawy M.; Rochereau C.; Rost B. Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling. bioRxiv 2023, 10.1101/2023.01.16.524265. DOI

Pokharel S.; Pratyush P.; Heinzinger M.; Newman R. H.; Kc D. B. Improving Protein Succinylation Sites Prediction Using Embeddings from Protein Language Model. Sci. Rep. 2022, 12, 16933.10.1038/s41598-022-21366-2. PubMed DOI PMC

Houlsby N.; Giurgiu A.; Jastrzebski S.; Morrone B.; De Laroussilhe Q.; Gesmundo A.; Attariyan M.; Gelly S.. Parameter-Efficient Transfer Learning for NLP. In Proceedings of the 36th International Conference on Machine Learning; Chaudhuri K., Salakhutdinov R., Eds.; Proceedings of Machine Learning Research, Vol. 97; PMLR, 2019; pp 2790–2799.

Yang W.; Liu C.; Li Z.. Lightweight Fine-Tuning a Pretrained Protein Language Model for Protein Secondary Structure Prediction. bioRxiv (Bioengineering), March 23, 2023, 2023.03.22.530066, ver. 1. 10.1101/2023.03.22.530066. DOI

Suzek B. E.; Wang Y.; Huang H.; McGarvey P. B.; Wu C. H. UniProt Consortium. UniRef Clusters: A Comprehensive and Scalable Alternative for Improving Sequence Similarity Searches. Bioinformatics 2015, 31 (6), 926–932. 10.1093/bioinformatics/btu739. PubMed DOI PMC

Nijkamp E.; Ruffolo J.; Weinstein E. N.; Naik N.; Madani A. ProGen2: Exploring the Boundaries of Protein Language Models. arXiv [cs.LG] 2022, 10.48550/arXiv.2206.13517. PubMed DOI

Finn R. D.; Bateman A.; Clements J.; Coggill P.; Eberhardt R. Y.; Eddy S. R.; Heger A.; Hetherington K.; Holm L.; Mistry J.; Sonnhammer E. L. L.; Tate J.; Punta M. Pfam: The Protein Families Database. Nucleic Acids Res. 2014, 42, D222–D230. 10.1093/nar/gkt1223. PubMed DOI PMC

Joosten R. P.; Salzemann J.; Bloch V.; Stockinger H.; Berglund A.-C.; Blanchet C.; Bongcam-Rudloff E.; Combet C.; Da Costa A. L.; Deleage G.; Diarena M.; Fabbretti R.; Fettahi G.; Flegel V.; Gisel A.; Kasam V.; Kervinen T.; Korpelainen E.; Mattila K.; Pagni M.; Reichstadt M.; Breton V.; Tickle I. J.; Vriend G. PDB_REDO: Automated Re-Refinement of X-Ray Structure Models in the PDB. J. Appl. Crystallogr. 2009, 42, 376–384. 10.1107/S0021889809008784. PubMed DOI PMC

Dauparas J.; Anishchenko I.; Bennett N.; Bai H.; Ragotte R. J.; Milles L. F.; Wicky B. I. M.; Courbet A.; de Haas R. J.; Bethel N.; Leung P. J. Y.; Huddy T. F.; Pellock S.; Tischer D.; Chan F.; Koepnick B.; Nguyen H.; Kang A.; Sankaran B.; Bera A. K.; King N. P.; Baker D. Robust Deep Learning–Based Protein Sequence Design Using ProteinMPNN. Science 2022, 378 (6615), 49–56. 10.1126/science.add2187. PubMed DOI PMC

Sillitoe I.; Lewis T. E.; Cuff A.; Das S.; Ashford P.; Dawson N. L.; Furnham N.; Laskowski R. A.; Lee D.; Lees J. G.; Lehtinen S.; Studer R. A.; Thornton J.; Orengo C. A. CATH: Comprehensive Structural and Functional Annotations for Genome Sequences. Nucleic Acids Res. 2015, 43, D376–D381. 10.1093/nar/gku947. PubMed DOI PMC

Varadi M.; Anyango S.; Deshpande M.; Nair S.; Natassia C.; Yordanova G.; Yuan D.; Stroe O.; Wood G.; Laydon A.; Žídek A.; Green T.; Tunyasuvunakool K.; Petersen S.; Jumper J.; Clancy E.; Green R.; Vora A.; Lutfi M.; Figurnov M.; Cowie A.; Hobbs N.; Kohli P.; Kleywegt G.; Birney E.; Hassabis D.; Velankar S. AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models. Nucleic Acids Res. 2022, 50 (D1), D439–D444. 10.1093/nar/gkab1061. PubMed DOI PMC

Rao R. M.; Liu J.; Verkuil R.; Meier J.; Canny J.; Abbeel P.; Sercu T.; Rives A.. MSA Transformer. In Proceedings of the 38th International Conference on Machine Learning; Meila M., Zhang T., Eds.; Proceedings of Machine Learning Research, Vol. 139; PMLR, 2021; pp 8844–8856.

Ho J.; Kalchbrenner N.; Weissenborn D.; Salimans T. Axial Attention in Multidimensional Transformers. arXiv [cs.CV] 2019, 10.48550/arXiv.1912.12180. DOI

Repecka D.; Jauniskis V.; Karpus L.; Rembeza E.; Rokaitis I.; Zrimec J.; Poviloniene S.; Laurynenas A.; Viknander S.; Abuajwa W.; Savolainen O.; Meskys R.; Engqvist M. K. M.; Zelezniak A. Expanding Functional Protein Sequence Spaces Using Generative Adversarial Networks. Nature Machine Intelligence 2021, 3 (4), 324–333. 10.1038/s42256-021-00310-5. DOI

Sevgen E.; Moller J.; Lange A.; Parker J.; Quigley S.; Mayer J.; Srivastava P.; Gayatri S.; Hosfield D.; Korshunova M.; Livne M.; Gill M.; Ranganathan R.; Costa A. B.; Ferguson A. L.. ProT-VAE: Protein Transformer Variational AutoEncoder for Functional Protein Design. bioRxiv (Synthetic Biology), January 24, 2023, 2023.01.23.525232, ver. 1. 10.1101/2023.01.23.525232. DOI

Luo Y.; Jiang G.; Yu T.; Liu Y.; Vo L.; Ding H.; Su Y.; Qian W. W.; Zhao H.; Peng J. ECNet Is an Evolutionary Context-Integrated Deep Learning Framework for Protein Engineering. Nat. Commun. 2021, 12, 5743.10.1038/s41467-021-25976-8. PubMed DOI PMC

Hochreiter S.; Schmidhuber J. Long Short-Term Memory. Neural Comput. 1997, 9 (8), 1735–1780. 10.1162/neco.1997.9.8.1735. PubMed DOI

Baek M.; DiMaio F.; Anishchenko I.; Dauparas J.; Ovchinnikov S.; Lee G. R.; Wang J.; Cong Q.; Kinch L. N.; Schaeffer R. D.; Millán C.; Park H.; Adams C.; Glassman C. R.; DeGiovanni A.; Pereira J. H.; Rodrigues A. V.; van Dijk A. A.; Ebrecht A. C.; Opperman D. J.; Sagmeister T.; Buhlheller C.; Pavkov-Keller T.; Rathinaswamy M. K.; Dalwadi U.; Yip C. K.; Burke J. E.; Garcia K. C.; Grishin N. V.; Adams P. D.; Read R. J.; Baker D. Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network. Science 2021, 373 (6557), 871–876. 10.1126/science.abj8754. PubMed DOI PMC

Illig A.-M.; Siedhoff N. E.; Schwaneberg U.; Davari M. D. A Hybrid Model Combining Evolutionary Probability and Machine Learning Leverages Data-Driven Protein Engineering. bioRxiv 2022, 10.1101/2022.06.07.495081. DOI

Ding X.; Zou Z.; Brooks C. L. Iii Deciphering Protein Evolution and Fitness Landscapes with Latent Space Models. Nat. Commun. 2019, 10, 5644.10.1038/s41467-019-13633-0. PubMed DOI PMC

Kohout P.; Vasina M.; Majerova M.; Novakova V.; Damborsky J.; Bednar D.; et al.Design of Enzymes for Biocatalysis, Bioremediation and Biosensing Using Variational Autoencoder-Generated Latent Space. ChemRxiv. Cambridge: Cambridge Open Engage, 2023.10.26434/chemrxiv-2023-jcds7. DOI

Ziegler C.; Martin J.; Sinner C.; Morcos F. Latent Generative Landscapes as Maps of Functional Diversity in Protein Sequence Space. Nat. Commun. 2023, 14, 2222.10.1038/s41467-023-37958-z. PubMed DOI PMC

Moffat L.; Jones D. T. Increasing the Accuracy of Single Sequence Prediction Methods Using a Deep Semi-Supervised Learning Framework. Bioinformatics 2021, 37 (21), 3744–3751. 10.1093/bioinformatics/btab491. PubMed DOI PMC

Bepler T.; Berger B.. Learning Protein Sequence Embeddings Using Information from Structure. International Conference on Learning Representations, New Orleans, LA, May 6–9, 2019; OpenReview, 2019. https://openreview.net/forum?id=SygLehCqtm

Rao R.; Bhattacharya N.; Thomas N.; Duan Y.; Chen X.; Canny J.; Abbeel P.; Song Y. S. Evaluating Protein Transfer Learning with TAPE. Adv. Neural Inf. Process. Syst. 2019, 32, 9689–9701. PubMed PMC

Crean R. M.; Gardner J. M.; Kamerlin S. C. L. Harnessing Conformational Plasticity to Generate Designer Enzymes. J. Am. Chem. Soc. 2020, 142 (26), 11324–11342. 10.1021/jacs.0c04924. PubMed DOI PMC

Guo H.-B.; Perminov A.; Bekele S.; Kedziora G.; Farajollahi S.; Varaljay V.; Hinkle K.; Molinero V.; Meister K.; Hung C.; Dennis P.; Kelley-Loughnane N.; Berry R. AlphaFold2 Models Indicate That Protein Sequence Determines Both Structure and Dynamics. Sci. Rep. 2022, 12, 10696.10.1038/s41598-022-14382-9. PubMed DOI PMC

Faidon Brotzakis Z.; Zhang S.; Vendruscolo M.. AlphaFold Prediction of Structural Ensembles of Disordered Proteins. bioRxiv (Biophysics), January 19, 2023, 2023.01.19.524720, ver. 1. 10.1101/2023.01.19.524720. DOI

Piana S.; Laio A. Advillin Folding Takes Place on a Hypersurface of Small Dimensionality. Phys. Rev. Lett. 2008, 101 (20), 20810110.1103/PhysRevLett.101.208101. PubMed DOI

Glielmo A.; Husic B. E.; Rodriguez A.; Clementi C.; Noé F.; Laio A. Unsupervised Learning Methods for Molecular Simulation Data. Chem. Rev. 2021, 121 (16), 9722–9758. 10.1021/acs.chemrev.0c01195. PubMed DOI PMC

Mardt A.; Pasquali L.; Wu H.; Noé F. VAMPnets for Deep Learning of Molecular Kinetics. Nat. Commun. 2018, 9, 5.10.1038/s41467-017-02388-1. PubMed DOI PMC

Marques S. M.; Kouba P.; Legrand A.; Sedlar J.; Disson L.; Planas-Iglesias J.; Sanusi Z.; Kunka A.; Damborsky J.; Pajdla T.; Prokop Z.; Mazurenko S.; Sivic J.; Bednar D.. Effects of Alzheimer’s Disease Drug Candidates on Disordered Aβ42 Dissected by Comparative Markov State Analysis (CoVAMPnet). bioRxiv (Biophysics), January 6, 2023, 2023.01.06.523007, ver. 1. 10.1101/2023.01.06.523007. DOI

Ward M. D.; Zimmerman M. I.; Meller A.; Chung M.; Swamidass S. J.; Bowman G. R. Deep Learning the Structural Determinants of Protein Biochemical Properties by Comparing Structural Ensembles with DiffNets. Nat. Commun. 2021, 12, 3023.10.1038/s41467-021-23246-1. PubMed DOI PMC

Akere A.; Chen S. H.; Liu X.; Chen Y.; Dantu S. C.; Pandini A.; Bhowmik D.; Haider S. Structure-Based Enzyme Engineering Improves Donor-Substrate Recognition of Arabidopsis Thaliana Glycosyltransferases. Biochem. J. 2020, 477 (15), 2791–2805. 10.1042/BCJ20200477. PubMed DOI PMC

Russ W. P.; Figliuzzi M.; Stocker C.; Barrat-Charlaix P.; Socolich M.; Kast P.; Hilvert D.; Monasson R.; Cocco S.; Weigt M.; Ranganathan R. An Evolution-Based Model for Designing Chorismate Mutase Enzymes. Science 2020, 369 (6502), 440–445. 10.1126/science.aba3304. PubMed DOI

Lu H.; Diaz D. J.; Czarnecki N. J.; Zhu C.; Kim W.; Shroff R.; Acosta D. J.; Alexander B. R.; Cole H. O.; Zhang Y.; Lynd N. A.; Ellington A. D.; Alper H. S. Machine Learning-Aided Engineering of Hydrolases for PET Depolymerization. Nature 2022, 604 (7907), 662–667. 10.1038/s41586-022-04599-z. PubMed DOI

Paik I.; Ngo P. H. T.; Shroff R.; Diaz D. J.; Maranhao A. C.; Walker D. J. F.; Bhadra S.; Ellington A. D. Improved Bst DNA Polymerase Variants Derived via a Machine Learning Approach. Biochemistry 2023, 62 (2), 410–418. 10.1021/acs.biochem.1c00451. PubMed DOI PMC

Weinstein J. J.; Goldenzweig A.; Hoch S.; Fleishman S. J. PROSS 2: A New Server for the Design of Stable and Highly Expressed Protein Variants. Bioinformatics 2021, 37 (1), 123–125. 10.1093/bioinformatics/btaa1071. PubMed DOI PMC

Musil M.; Stourac J.; Bendl J.; Brezovsky J.; Prokop Z.; Zendulka J.; Martinek T.; Bednar D.; Damborsky J. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic Acids Res. 2017, 45 (W1), W393–W399. 10.1093/nar/gkx285. PubMed DOI PMC

Kunka A.; Marques S.; Havlasek M.; Vasina M.; Velatova N.; Cengelova L.; Kovar D.; Damborsky J.; Marek M.; Bednar D.; Prokop Z. Advancing Enzyme′s Stability and Catalytic Efficiency through Synergy of Force-Field Calculations, Evolutionary Analysis and Machine Learning. ACS Catal. 2023, 13, 12506–12518. 10.1021/acscatal.3c02575. PubMed DOI PMC

Wicky B. I. M.; Milles L. F.; Courbet A.; Ragotte R. J.; Dauparas J.; Kinfu E.; Tipps S.; Kibler R. D.; Baek M.; DiMaio F.; Li X.; Carter L.; Kang A.; Nguyen H.; Bera A. K.; Baker D. Hallucinating Symmetric Protein Assemblies. Science 2022, 378 (6615), 56–61. 10.1126/science.add1964. PubMed DOI PMC

Hawkins-Hooker A.; Depardieu F.; Baur S.; Couairon G.; Chen A.; Bikard D. Generating Functional Protein Variants with Variational Autoencoders. PLoS Comput. Biol. 2021, 17 (2), e100873610.1371/journal.pcbi.1008736. PubMed DOI PMC

Vasina M.; Vanacek P.; Hon J.; Kovar D.; Faldynova H.; Kunka A.; Buryska T.; Badenhorst C. P. S.; Mazurenko S.; Bednar D.; Stavrakis S.; Bornscheuer U. T.; deMello A.; Damborsky J.; Prokop Z. Advanced Database Mining of Efficient Haloalkane Dehalogenases by Sequence and Structure Bioinformatics and Microfluidics. Chem. Catalysis 2022, 2 (10), 2704–2725. 10.1016/j.checat.2022.09.011. DOI

Pardo I.; Bednar D.; Calero P.; Volke D. C.; Damborský J.; Nikel P. I. A Nonconventional Archaeal Fluorinase Identified by In Silico Mining for Enhanced Fluorine Biocatalysis. ACS Catal. 2022, 12 (11), 6570–6577. 10.1021/acscatal.2c01184. PubMed DOI PMC

Yeh A. H.-W.; Norn C.; Kipnis Y.; Tischer D.; Pellock S. J.; Evans D.; Ma P.; Lee G. R.; Zhang J. Z.; Anishchenko I.; Coventry B.; Cao L.; Dauparas J.; Halabiya S.; DeWitt M.; Carter L.; Houk K. N.; Baker D. De Novo Design of Luciferases Using Deep Learning. Nature 2023, 614 (7949), 774–780. 10.1038/s41586-023-05696-3. PubMed DOI PMC

Büchler J.; Malca S. H.; Patsch D.; Voss M.; Turner N. J.; Bornscheuer U. T.; Allemann O.; Le Chapelain C.; Lumbroso A.; Loiseleur O.; Buller R. Algorithm-Aided Engineering of Aliphatic Halogenase WelO5* for the Asymmetric Late-Stage Functionalization of Soraphens. Nat. Commun. 2022, 13, 371.10.1038/s41467-022-27999-1. PubMed DOI PMC

Saito Y.; Oikawa M.; Sato T.; Nakazawa H.; Ito T.; Kameda T.; Tsuda K.; Umetsu M. Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration. ACS Catal. 2021, 11 (23), 14615–14624. 10.1021/acscatal.1c03753. DOI

Greenhalgh J. C.; Fahlberg S. A.; Pfleger B. F.; Romero P. A. Machine Learning-Guided Acyl-ACP Reductase Engineering for Improved in Vivo Fatty Alcohol Production. Nat. Commun. 2021, 12, 5825.10.1038/s41467-021-25831-w. PubMed DOI PMC

Schenkmayerova A.; Pinto G. P.; Toul M.; Marek M.; Hernychova L.; Planas-Iglesias J.; Daniel Liskova V.; Pluskal D.; Vasina M.; Emond S.; Dörr M.; Chaloupkova R.; Bednar D.; Prokop Z.; Hollfelder F.; Bornscheuer U. T.; Damborsky J. Engineering the Protein Dynamics of an Ancestral Luciferase. Nat. Commun. 2021, 12, 3616.10.1038/s41467-021-23450-z. PubMed DOI PMC

Chaloupkova R.; Liskova V.; Toul M.; Markova K.; Sebestova E.; Hernychova L.; Marek M.; Pinto G. P.; Pluskal D.; Waterman J.; Prokop Z.; Damborsky J. Light-Emitting Dehalogenases: Reconstruction of Multifunctional Biocatalysts. ACS Catal. 2019, 9 (6), 4810–4823. 10.1021/acscatal.9b01031. DOI

Klesmith J. R.; Bacik J.-P.; Wrenbeck E. E.; Michalczyk R.; Whitehead T. A. Trade-Offs between Enzyme Fitness and Solubility Illuminated by Deep Mutational Scanning. Proc. Natl. Acad. Sci. U. S. A. 2017, 114 (9), 2265–2270. 10.1073/pnas.1614437114. PubMed DOI PMC

MacLeod B. P.; Parlane F. G. L.; Rupnow C. C.; Dettelbach K. E.; Elliott M. S.; Morrissey T. D.; Haley T. H.; Proskurin O.; Rooney M. B.; Taherimakhsousi N.; Dvorak D. J.; Chiu H. N.; Waizenegger C. E. B.; Ocean K.; Mokhtari M.; Berlinguette C. P. A Self-Driving Laboratory Advances the Pareto Front for Material Properties. Nat. Commun. 2022, 13, 995.10.1038/s41467-022-28580-6. PubMed DOI PMC

Li W.; Yao X.; Zhang T.; Wang R.; Wang L. Hierarchy Ranking Method for Multimodal Multi-Objective Optimization with Local Pareto Fronts. IEEE Trans. Evol. Computat. 2023, 27, 98.10.1109/TEVC.2022.3155757. DOI

Miton C. M.; Tokuriki N. Insertions and Deletions (Indels): A Missing Piece of the Protein Engineering Jigsaw. Biochemistry 2023, 62 (2), 148–157. 10.1021/acs.biochem.2c00188. PubMed DOI

Gonzalez C. E.; Roberts P.; Ostermeier M. Fitness Effects of Single Amino Acid Insertions and Deletions in TEM-1 β-Lactamase. J. Mol. Biol. 2019, 431 (12), 2320–2330. 10.1016/j.jmb.2019.04.030. PubMed DOI PMC

Fan X.; Pan H.; Tian A.; Chung W. K.; Shen Y. SHINE: Protein Language Model-Based Pathogenicity Prediction for Short Inframe Insertion and Deletion Variants. Brief. Bioinform. 2023, 24 (1), bbac58410.1093/bib/bbac584. PubMed DOI PMC

Ross C. M.; Foley G.; Boden M.; Gillam E. M. J. Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). Methods Mol. Biol. 2022, 2397, 85–110. 10.1007/978-1-0716-1826-4_6. PubMed DOI

Park H.-S.; Nam S.-H.; Lee J. K.; Yoon C. N.; Mannervik B.; Benkovic S. J.; Kim H.-S. Design and Evolution of New Catalytic Activity with an Existing Protein Scaffold. Science 2006, 311 (5760), 535–538. 10.1126/science.1118953. PubMed DOI

Babkova P.; Sebestova E.; Brezovsky J.; Chaloupkova R.; Damborsky J. Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity. Chembiochem 2017, 18 (14), 1448–1456. 10.1002/cbic.201700197. PubMed DOI

Arpino J. A. J.; Rizkallah P. J.; Jones D. D. Structural and Dynamic Changes Associated with Beneficial Engineered Single-Amino-Acid Deletion Mutations in Enhanced Green Fluorescent Protein. Acta Crystallogr. D Biol. Crystallogr. 2014, 70 (8), 2152–2162. 10.1107/S139900471401267X. PubMed DOI PMC

Dumas A.; Lercher L.; Spicer C. D.; Davis B. G. Designing Logical Codon Reassignment - Expanding the Chemistry in Biology. Chem. Sci. 2015, 6 (1), 50–69. 10.1039/C4SC01534G. PubMed DOI PMC

Hankore E. D.; Zhang L.; Chen Y.; Liu K.; Niu W.; Guo J. Genetic Incorporation of Noncanonical Amino Acids Using Two Mutually Orthogonal Quadruplet Codons. ACS Synth. Biol. 2019, 8 (5), 1168–1174. 10.1021/acssynbio.9b00051. PubMed DOI PMC

An X.; Chen C.; Wang T.; Huang A.; Zhang D.; Han M.-J.; Wang J. Genetic Incorporation of Selenotyrosine Significantly Improves Enzymatic Activity of Agrobacterium Radiobacter Phosphotriesterase. Chembiochem 2021, 22 (15), 2535–2539. 10.1002/cbic.202000460. PubMed DOI

Zhang H.; Zheng Z.; Dong L.; Shi N.; Yang Y.; Chen H.; Shen Y.; Xia Q. Rational Incorporation of Any Unnatural Amino Acid into Proteins by Machine Learning on Existing Experimental Proofs. Comput. Struct. Biotechnol. J. 2022, 20, 4930–4941. 10.1016/j.csbj.2022.08.063. PubMed DOI PMC

Gainza P.; Sverrisson F.; Monti F.; Rodolà E.; Boscaini D.; Bronstein M. M.; Correia B. E. Deciphering Interaction Fingerprints from Protein Molecular Surfaces Using Geometric Deep Learning. Nat. Methods 2020, 17 (2), 184–192. 10.1038/s41592-019-0666-6. PubMed DOI

Ketata M. A.; Laue C.; Mammadov R.; Stark H.; Wu M.; Corso G.; Marquet C.; Barzilay R.; Jaakkola T. S.. DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models. In The Eleventh International Conference on Learning Representations, Kigali, Rwanda, May 1–5, 2023; OpenReview, 2023. https://openreview.net/pdf?id=AM7WbQxuRS

Geng C.; Xue L. C.; Roel-Touris J.; Bonvin A. M. J. J. Finding the ΔΔG Spot: Are Predictors of Binding Affinity Changes upon Mutations in Protein–Protein Interactions Ready for It?. WIREs Comput. Mol. Sci. 2019, 9 (5), e141010.1002/wcms.1410. DOI

Jiang Y.; Quan L.; Li K.; Li Y.; Zhou Y.; Wu T.; Lyu Q. DGCddG: Deep Graph Convolution for Predicting Protein-Protein Binding Affinity Changes Upon Mutations. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20 (3), 2089–2100. 10.1109/TCBB.2022.3233627. PubMed DOI

Shan S.; Luo S.; Yang Z.; Hong J.; Su Y.; Ding F.; Fu L.; Li C.; Chen P.; Ma J.; Shi X.; Zhang Q.; Berger B.; Zhang L.; Peng J. Deep Learning Guided Optimization of Human Antibody against SARS-CoV-2 Variants with Broad Neutralization. Proc. Natl. Acad. Sci. U. S. A. 2022, 119 (11), e212295411910.1073/pnas.2122954119. PubMed DOI PMC

Jin W.; Sarkizova S.; Chen X.; Hacohen N.; Uhler C. Unsupervised Protein-Ligand Binding Energy Prediction via Neural Euler’s Rotation Equation. arXiv [q-bio.BM] 2023, 10.48550/arXiv.2301.10814. DOI

Jiang Y.; Neti S. S.; Sitarik I.; Pradhan P.; To P.; Xia Y.; Fried S. D.; Booker S. J.; O’Brien E. P. How Synonymous Mutations Alter Enzyme Structure and Function over Long Timescales. Nat. Chem. 2023, 15 (3), 308–318. 10.1038/s41557-022-01091-z. PubMed DOI PMC

Nikolados E.-M.; Oyarzún D. A. Deep Learning for Optimization of Protein Expression. Curr. Opin. Biotechnol. 2023, 81, 10294110.1016/j.copbio.2023.102941. PubMed DOI

Rosenberg A. A.; Marx A.; Bronstein A. M. Codon-Specific Ramachandran Plots Show Amino Acid Backbone Conformation Depends on Identity of the Translated Codon. Nat. Commun. 2022, 13, 2815.10.1038/s41467-022-30390-9. PubMed DOI PMC

Saunders R.; Deane C. M. Synonymous Codon Usage Influences the Local Protein Structure Observed. Nucleic Acids Res. 2010, 38 (19), 6719–6728. 10.1093/nar/gkq495. PubMed DOI PMC

Outeiral C.; Deane C. M. Codon Language Embeddings Provide Strong Signals for Protein Engineering. bioRxiv 2022, 10.1101/2022.12.15.519894. DOI

Constant D. A.; Gutierrez J. M.; Sastry A. V.; Viazzo R.; Smith N. R.; Hossain J.; Spencer D. A.; Carter H.; Ventura A. B.; Louie M. T. M.; Kohnert C.; Consbruck R.; Bennett J.; Crawford K. A.; Sutton J. M.; Morrison A.; Steiger A. K.; Jackson K. A.; Stanton J. T.; Abdulhaqq S.; Hannum G.; Meier J.; Weinstock M.; Gander M.. Deep Learning-Based Codon Optimization with Large-Scale Synonymous Variant Datasets Enables Generalized Tunable Protein Expression. bioRxiv (Synthetic Biology), February 12, 2023, 2023.02.11.528149, ver. 1. 10.1101/2023.02.11.528149. DOI

Ruscio J. Z.; Kohn J. E.; Ball K. A.; Head-Gordon T. The Influence of Protein Dynamics on the Success of Computational Enzyme Design. J. Am. Chem. Soc. 2009, 131 (39), 14111–14115. 10.1021/ja905396s. PubMed DOI PMC

Peccati F.; Alunno-Rufini S.; Jiménez-Osés G. Accurate Prediction of Enzyme Thermostabilization with Rosetta Using AlphaFold Ensembles. J. Chem. Inf. Model. 2023, 63 (3), 898–909. 10.1021/acs.jcim.2c01083. PubMed DOI PMC

Acevedo-Rocha C. G.; Li A.; D’Amore L.; Hoebenreich S.; Sanchis J.; Lubrano P.; Ferla M. P.; Garcia-Borràs M.; Osuna S.; Reetz M. T. Pervasive Cooperative Mutational Effects on Multiple Catalytic Enzyme Traits Emerge via Long-Range Conformational Dynamics. Nat. Commun. 2021, 12, 1621.10.1038/s41467-021-21833-w. PubMed DOI PMC

Bonk B. M.; Weis J. W.; Tidor B. Machine Learning Identifies Chemical Characteristics That Promote Enzyme Catalysis. J. Am. Chem. Soc. 2019, 141 (9), 4108–4118. 10.1021/jacs.8b13879. PubMed DOI PMC

Zhong E. D.; Bepler T.; Berger B.; Davis J. H. CryoDRGN: Reconstruction of Heterogeneous Cryo-EM Structures Using Neural Networks. Nat. Methods 2021, 18 (2), 176–185. 10.1038/s41592-020-01049-4. PubMed DOI PMC

Jia K.; Kilinc M.; Jernigan R. L. Functional Protein Dynamics Directly from Sequences. J. Phys. Chem. B 2023, 127 (9), 1914–1921. 10.1021/acs.jpcb.2c05766. PubMed DOI PMC

Wang T.; Zhu J.-Y.; Torralba A.; Efros A. A. Dataset Distillation. arXiv [cs.LG] 2018, 10.48550/arXiv.1811.10959. DOI

Hinton G.; Vinyals O.; Dean J. Distilling the Knowledge in a Neural Network. arXiv [stat.ML] 2015, 10.48550/arXiv.1503.02531. DOI

Deng L. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Process. Mag. 2012, 29 (6), 141–142. 10.1109/MSP.2012.2211477. DOI

Le T.-T.-H.; Larasati H. T.; Prihatno A. T.; Kim H.. A Review of Dataset Distillation for Deep Learning. In Proceedings of the 2022 International Conference on Platform Technology and Service (PlatCon), Jeju, South Korea, August 22–24, 2022; IEEE, 2022; pp 34–37. 10.1109/PlatCon55845.2022.9932086 DOI

Yu R.; Liu S.; Wang X. Dataset Distillation: A Comprehensive Review. arXiv [cs.LG] 2023, 10.48550/arXiv.2301.07014. PubMed DOI

Lei S.; Tao D. A Comprehensive Survey of Dataset Distillation. arXiv [cs.LG] 2023, 10.48550/arXiv.2301.05603. PubMed DOI

Abraham M.; Apostolov R.; Barnoud J.; Bauer P.; Blau C.; Bonvin A. M. J. J.; Chavent M.; Chodera J.; Čondić-Jurkić K.; Delemotte L.; Grubmüller H.; Howard R. J.; Jordan E. J.; Lindahl E.; Ollila O. H. S.; Selent J.; Smith D. G. A.; Stansfeld P. J.; Tiemann J. K. S.; Trellet M.; Woods C.; Zhmurov A. Sharing Data from Molecular Simulations. J. Chem. Inf. Model. 2019, 59 (10), 4093–4099. 10.1021/acs.jcim.9b00665. PubMed DOI

Serafeim A.-P.; Salamanos G.; Patapati K. K.; Glykos N. M. Sensitivity of Folding Molecular Dynamics Simulations to Even Minor Force Field Changes. J. Chem. Inf. Model. 2016, 56 (10), 2035–2041. 10.1021/acs.jcim.6b00493. PubMed DOI

Wilkinson M. D.; Dumontier M.; Aalbersberg I. J. J.; Appleton G.; Axton M.; Baak A.; Blomberg N.; Boiten J.-W.; da Silva Santos L. B.; Bourne P. E.; Bouwman J.; Brookes A. J.; Clark T.; Crosas M.; Dillo I.; Dumon O.; Edmunds S.; Evelo C. T.; Finkers R.; Gonzalez-Beltran A.; Gray A. J. G.; Groth P.; Goble C.; Grethe J. S.; Heringa J.; ’t Hoen P. A. C.; Hooft R.; Kuhn T.; Kok R.; Kok J.; Lusher S. J.; Martone M. E.; Mons A.; Packer A. L.; Persson B.; Rocca-Serra P.; Roos M.; van Schaik R.; Sansone S.-A.; Schultes E.; Sengstag T.; Slater T.; Strawn G.; Swertz M. A.; Thompson M.; van der Lei J.; van Mulligen E.; Velterop J.; Waagmeester A.; Wittenburg P.; Wolstencroft K.; Zhao J.; Mons B. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 16001810.1038/sdata.2016.18. PubMed DOI PMC

Tiemann J. K. S.; Szczuka M.; Bouarroudj L.; Oussaren M.; Garcia S.; Howard R. J.; Delemotte L.; Lindahl E.; Baaden M.; Lindorff-Larsen K.; Chavent M.; Poulain P. MDverse: Shedding Light on the Dark Matter of Molecular Dynamics Simulations. bioRxiv 2023, 10.1101/2023.05.02.538537. PubMed DOI PMC

Durumeric A. E. P.; Charron N. E.; Templeton C.; Musil F.; Bonneau K.; Pasos-Trejo A. S.; Chen Y.; Kelkar A.; Noé F.; Clementi C. Machine Learned Coarse-Grained Protein Force-Fields: Are We There Yet?. Curr. Opin. Struct. Biol. 2023, 79, 10253310.1016/j.sbi.2023.102533. PubMed DOI PMC

Beyer L.; Hénaff O. J.; Kolesnikov A.; Zhai X.; van den Oord A. Are We Done with ImageNet?. arXiv [cs.CV] 2020, 10.48550/arXiv.2006.07159. DOI

Everingham M.; Van Gool L.; Williams C. K. I.; Winn J.; Zisserman A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303.10.1007/s11263-009-0275-4. DOI

Deng J.; Dong W.; Socher R.; Li L.-J.; Li K.; Fei-Fei L.. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, June 20–25, 2009; IEEE, 2009; pp 248–255. 10.1109/CVPR.2009.5206848 DOI

LeCun Y.; Haffner P.; Bottou L.; Bengio Y.. Object Recognition with Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision; Springer, 1999; pp 319–345.

He K.; Zhang X.; Ren S.; Sun J.. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, June 27–30, 2016; IEEE, 2016; pp 770–778. 10.1109/CVPR.2016.90 DOI

Thiyagalingam J.; Shankar M.; Fox G.; Hey T. Scientific Machine Learning Benchmarks. Nature Reviews Physics 2022, 4 (6), 413–420. 10.1038/s42254-022-00441-7. DOI

Steinegger M.; Söding J. Clustering Huge Protein Sequence Sets in Linear Time. Nat. Commun. 2018, 9, 2542.10.1038/s41467-018-04964-5. PubMed DOI PMC

Gao M.; Skolnick J. Structural Space of Protein-Protein Interfaces Is Degenerate, Close to Complete, and Highly Connected. Proc. Natl. Acad. Sci. U. S. A. 2010, 107 (52), 22517–22522. 10.1073/pnas.1012820107. PubMed DOI PMC

Burra P. V.; Zhang Y.; Godzik A.; Stec B. Global Distribution of Conformational States Derived from Redundant Models in the PDB Points to Non-Uniqueness of the Protein Structure. Proc. Natl. Acad. Sci. U. S. A. 2009, 106 (26), 10505–10510. 10.1073/pnas.0812152106. PubMed DOI PMC

Robin X.; Leemann M.; Sagasta A.; Eberhardt J.; Schwede T.; Durairaj J. Automated Benchmarking of Combined Protein Structure and Ligand Conformation Prediction. Authorea Preprints 2023, 10.22541/au.168382988.85108031/v1. PubMed DOI

Wang R.; Fang X.; Lu Y.; Yang C.-Y.; Wang S. The PDBbind Database: Methodologies and Updates. J. Med. Chem. 2005, 48 (12), 4111–4119. 10.1021/jm048957q. PubMed DOI

Dallago C.; Mou J.; Johnston K. E.; Wittmann B.; Bhattacharya N.; Goldman S.; Madani A.; Yang K. K.. FLIP: Benchmark Tasks in Fitness Landscape Inference for Proteins. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, Vol. 1; Vanschoren J., Yeung S., Eds.; Curran Associates, Inc.: Red Hook, NY, 2021.

Morehead A.; Chen C.; Sedova A.; Cheng J. DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction. arXiv [q-bio.QM] 2021, 10.48550/arXiv.2106.04362. PubMed DOI PMC

Kryshtafovych A.; Schwede T.; Topf M.; Fidelis K.; Moult J. Critical Assessment of Methods of Protein Structure Prediction (CASP)-Round XIV. Proteins 2021, 89 (12), 1607–1617. 10.1002/prot.26237. PubMed DOI PMC

Janin J.; Henrick K.; Moult J.; Eyck L. T.; Sternberg M. J. E.; Vajda S.; Vakser I.; Wodak S. J. Critical Assessment of PRedicted Interactions. CAPRI: A Critical Assessment of PRedicted Interactions. Proteins 2003, 52 (1), 2–9. 10.1002/prot.10381. PubMed DOI

Andreoletti G.; Hoskins R. A.; Repo S.; Barsky D.; Brenner S. E.; Mult J.; Participants C. Abstract 3295: CAGI: The Critical Assessment of Genome Interpretation, a Community Experiment to Evaluate Phenotype Prediction: Implications for Predicting Impact of Variants in Cancer. Cancer Res. 2018, 78, 3295–3295. 10.1158/1538-7445.AM2018-3295. DOI

Grešová K.; Martinek V.; Čechák D.; Šimeček P.; Alexiou P. Genomic Benchmarks: A Collection of Datasets for Genomic Sequence Classification. BMC Genomic Data 2023, 24, 25.10.1186/s12863-023-01123-8. PubMed DOI PMC

Buterez D.; Janet J. P.; Kiddle S. J.; Liò P. MF-PCBA: Multifidelity High-Throughput Screening Benchmarks for Drug Discovery and Machine Learning. J. Chem. Inf. Model. 2023, 63 (9), 2667–2678. 10.1021/acs.jcim.2c01569. PubMed DOI PMC

Walsh I.; Fishman D.; Garcia-Gasulla D.; Titma T.; Pollastri G.; Capriotti E.; Casadio R.; Capella-Gutierrez S.; Cirillo D.; Del Conte A.; Dimopoulos A. C.; Del Angel V. D.; Dopazo J.; Fariselli P.; Fernandez J. M.; Huber F.; Kreshuk A.; Lenaerts T.; Martelli P. L.; Navarro A.; Broin P. O; Pinero J.; Piovesan D.; Reczko M.; Ronzano F.; Satagopam V.; Savojardo C.; Spiwok V.; Tangaro M. A.; Tartari G.; Salgado D.; Valencia A.; Zambelli F.; Harrow J.; Psomopoulos F. E.; Tosatto S. C. E. DOME: Recommendations for Supervised Machine Learning Validation in Biology. Nat. Methods 2021, 18 (10), 1122–1127. 10.1038/s41592-021-01205-4. PubMed DOI

Mirdita M.; Schütze K.; Moriwaki Y.; Heo L.; Ovchinnikov S.; Steinegger M. ColabFold: Making Protein Folding Accessible to All. Nat. Methods 2022, 19 (6), 679–682. 10.1038/s41592-022-01488-1. PubMed DOI PMC

Lee B. D.; Gitter A.; Greene C. S.; Raschka S.; Maguire F.; Titus A. J.; Kessler M. D.; Lee A. J.; Chevrette M. G.; Stewart P. A.; Britto-Borges T.; Cofer E. M.; Yu K.-H.; Carmona J. J.; Fertig E. J.; Kalinin A. A.; Signal B.; Lengerich B. J.; Triche T. J. Jr; Boca S. M. Ten Quick Tips for Deep Learning in Biology. PLoS Comput. Biol. 2022, 18 (3), e1009803.10.1371/journal.pcbi.1009803. PubMed DOI PMC

Samek W.; Müller K.-R.. Towards Explainable Artificial Intelligence. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Samek W., Montavon G., Vedaldi A., Hansen L. K., Müller K.-R., Eds.; Springer, 2019; pp 5–22.

Wellawatte G. P.; Gandhi H. A.; Seshadri A.; White A. D. A Perspective on Explanations of Molecular Prediction Models. J. Chem. Theory Comput. 2023, 19 (8), 2149–2160. 10.1021/acs.jctc.2c01235. PubMed DOI PMC

Holzinger A.; Saranti A.; Molnar C.; Biecek P.; Samek W.. Explainable AI Methods - A Brief Overview. In xxAI - Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers; Holzinger A., Goebel R., Fong R., Moon T., Müller K.-R., Samek W., Eds.; Springer, 2022; pp 13–38.

Montavon G.; Binder A.; Lapuschkin S.; Samek W.; Müller K.-R.. Layer-Wise Relevance Propagation: An Overview. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Samek W., Montavon G., Vedaldi A., Hansen L. K., Müller K.-R., Eds.; Springer, 2019; pp 193–209.

van der Zanden T. C.; Bodlaender H. L.; Hamers H. J. M. Efficiently Computing the Shapley Value of Connectivity Games in Low-Treewidth Graphs. Oper. Res. Int. J. 2023, 23, 6.10.1007/s12351-023-00742-4. DOI

Ribeiro M. T.; Singh S.; Guestrin C.. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD ’16 San Francisco, CA, August 13–17, 2016; Association for Computing Machinery: New York, NY, 2016; pp 1135–1144.

Ivanovs M.; Kadikis R.; Ozols K. Perturbation-Based Methods for Explaining Deep Neural Networks: A Survey. Pattern Recognit. Lett. 2021, 150, 228–234. 10.1016/j.patrec.2021.06.030. DOI

Ma J.; Yu M. K.; Fong S.; Ono K.; Sage E.; Demchak B.; Sharan R.; Ideker T. Using Deep Learning to Model the Hierarchical Structure and Function of a Cell. Nat. Methods 2018, 15 (4), 290–298. 10.1038/nmeth.4627. PubMed DOI PMC

Novakovsky G.; Dexter N.; Libbrecht M. W.; Wasserman W. W.; Mostafavi S. Obtaining Genetics Insights from Deep Learning via Explainable Artificial Intelligence. Nat. Rev. Genet. 2023, 24 (2), 125–137. 10.1038/s41576-022-00532-2. PubMed DOI

Fortelny N.; Bock C. Knowledge-Primed Neural Networks Enable Biologically Interpretable Deep Learning on Single-Cell Sequencing Data. Genome Biol. 2020, 21, 190.10.1186/s13059-020-02100-5. PubMed DOI PMC

Nikolados E.-M.; Wongprommoon A.; Aodha O. M.; Cambray G.; Oyarzún D. A. Accuracy and Data Efficiency in Deep Learning Models of Protein Expression. Nat. Commun. 2022, 13, 7755.10.1038/s41467-022-34902-5. PubMed DOI PMC

Xu F.; Uszkoreit H.; Du Y.; Fan W.; Zhao D.; Zhu J.. Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges. In Natural Language Processing and Chinese Computing; Springer, 2019; pp 563–574.

Shimazaki T.; Tachikawa M. Collaborative Approach between Explainable Artificial Intelligence and Simplified Chemical Interactions to Explore Active Ligands for Cyclin-Dependent Kinase 2. ACS Omega 2022, 7 (12), 10372–10381. 10.1021/acsomega.1c06976. PubMed DOI PMC

Probst D.Explainable Prediction of Catalysing Enzymes from Reactions Using Multilayer Perceptrons. bioRxiv (Bioinformatics), January 30, 2023, 2023.01.28.526009, ver. 1. 10.1101/2023.01.28.526009. DOI

Li C.; Liu J.; Chen J.; Yuan Y.; Yu J.; Gou Q.; Guo Y.; Pu X. An Interpretable Convolutional Neural Network Framework for Analyzing Molecular Dynamics Trajectories: A Case Study on Functional States for G-Protein-Coupled Receptors. J. Chem. Inf. Model. 2022, 62 (6), 1399–1410. 10.1021/acs.jcim.2c00085. PubMed DOI

Tan J.; Zhang Y.. ExplainableFold: Understanding AlphaFold Prediction with Explainable AI. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining KDD ’23; Association for Computing Machinery: New York, NY, 2023; pp 2166–2176.

Hoover B.; Strobelt H.; Gehrmann S.. ExBERT: A VIsual ANalysis TOol to EXplore LEarned REpresentations in TRansformer MOdels. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Association for Computational Linguistics, 2020; pp 187–196.

Ferruz N.; Höcker B. Controllable Protein Design with Language Models. Nature Machine Intelligence 2022, 4 (6), 521–532. 10.1038/s42256-022-00499-z. DOI

Abd Elrahman S. M.; Abraham A. A review of class imbalance problem. J. Netw. Innov. Comput. 2013, 1, 332–340.

Haixiang G.; Yijing L.; Shang J.; Mingyun G.; Yuanyue H.; Bing G. Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Syst. Appl. 2017, 73, 220–239. 10.1016/j.eswa.2016.12.035. DOI

Kaur H.; Pannu H. S.; Malhi A. K. A Systematic Review on Imbalanced Data Challenges in Machine Learning: Applications and Solutions. ACM Comput. Surv. 2020, 52 (4), 1–36. 10.1145/3343440. DOI

Esposito C.; Landrum G. A.; Schneider N.; Stiefl N.; Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J. Chem. Inf. Model. 2021, 61 (6), 2623–2640. 10.1021/acs.jcim.1c00160. PubMed DOI

Weidinger L.; Mellor J.; Rauh M.; Griffin C.; Uesato J.; Huang P.-S.; Cheng M.; Glaese M.; Balle B.; Kasirzadeh A.; Kenton Z.; Brown S.; Hawkins W.; Stepleton T.; Biles C.; Birhane A.; Haas J.; Rimell L.; Hendricks L. A.; Isaac W.; Legassick S.; Irving G.; Gabriel I.. Ethical and Social Risks of Harm from Language Models. arXiv (Computer Science.Computation and Language), December 8, 2021, 2112.04359, ver. 1.10.48550/arXiv.2112.04359 DOI

Kessler M. D.; Yerges-Armstrong L.; Taub M. A.; Shetty A. C.; Maloney K.; Jeng L. J. B.; Ruczinski I.; Levin A. M.; Williams L. K.; Beaty T. H.; Mathias R. A.; Barnes K. C.; et al. Challenges and Disparities in the Application of Personalized Genomic Medicine to Populations with African Ancestry. Nat. Commun. 2016, 7, 12521.10.1038/ncomms12521. PubMed DOI PMC

Sullivan B. J.; Nguyen T.; Durani V.; Mathur D.; Rojas S.; Thomas M.; Syu T.; Magliery T. J. Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability. J. Mol. Biol. 2012, 420 (4–5), 384–399. 10.1016/j.jmb.2012.04.025. PubMed DOI PMC

Fang J. A Critical Review of Five Machine Learning-Based Algorithms for Predicting Protein Stability Changes upon Mutation. Brief. Bioinform. 2020, 21 (4), 1285–1292. 10.1093/bib/bbz071. PubMed DOI PMC

Pucci F.; Bernaerts K. V.; Kwasigroch J. M.; Rooman M. Quantification of Biases in Predictions of Protein Stability Changes upon Mutations. Bioinformatics 2018, 34 (21), 3659–3665. 10.1093/bioinformatics/bty348. PubMed DOI

Caldararu O.; Blundell T. L.; Kepp K. P. A Base Measure of Precision for Protein Stability Predictors: Structural Sensitivity. BMC Bioinformatics 2021, 22, 88.10.1186/s12859-021-04030-w. PubMed DOI PMC

Scantlebury J.; Vost L.; Carbery A.; Hadfield T. E.; Turnbull O. M.; Brown N.; Chenthamarakshan V.; Das P.; Grosjean H.; von Delft F.; Deane C. M. A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening. J. Chem. Inf. Model. 2023, 63 (10), 2960–2974. 10.1021/acs.jcim.3c00322. PubMed DOI PMC

Hebert-Johnson U.; Kim M.; Reingold O.; Rothblum G.. Multicalibration: Calibration for the (COmputationally-Identifiable) Masses. In Proceedings of the 35th International Conference on Machine Learning; Dy J., Krause A., Eds.; Proceedings of Machine Learning Research; PMLR, 10--15 Jul 2018; Vol. 80, pp 1939–1948.

Gopalan P.; Kim M. P.; Singhal M. A.; Zhao S.. Low-Degree Multicalibration. In Proceedings of Thirty Fifth Conference on Learning Theory; Loh P.-L., Raginsky M., Eds.; Proceedings of Machine Learning Research, Vol. 178; PMLR, 2022; pp 3193–3234.

Kim M. P.; Ghorbani A.; Zou J.. Multiaccuracy: Black-Box Post-Processing for Fairness in Classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society AIES ’19; Association for Computing Machinery: New York, NY, 2019; pp 247–254.

Pessach D.; Shmueli E. Algorithmic Fairness. arXiv [cs.CY] 2020, 10.48550/arXiv.2001.09784. DOI

Minot M.; Reddy S. T.. Meta Learning Improves Robustness and Performance in Machine Learning-Guided Protein Engineering. bioRxiv, January 30, 2023, 2023.01.30.526201, ver. 1. 10.1101/2023.01.30.526201. DOI

Musaelian A.; Johansson A.; Batzner S.; Kozinsky B. Scaling the Leading Accuracy of Deep Equivariant Models to Biomolecular Simulations of Realistic Size. arXiv [physics.comp-ph] 2023, 10.48550/arXiv.2304.10061. DOI

Shaw D. E.; Adams P. J.; Azaria A.; Bank J. A.; Batson B.; Bell A.; Bergdorf M.; Bhatt J.; Butts J. A.; Correia T.; Dirks R. M.; Dror R. O.; Eastwood M. P.; Edwards B.; Even A.; Feldmann P.; Fenn M.; Fenton C. H.; Forte A.; Gagliardo J.; Gill G.; Gorlatova M.; Greskamp B.; Grossman J. P.; Gullingsrud J.; Harper A.; Hasenplaugh W.; Heily M.; Heshmat B. C.; Hunt J.; Ierardi D. J.; Iserovich L.; Jackson B. L.; Johnson N. P.; Kirk M. M.; Klepeis J. L.; Kuskin J. S.; Mackenzie K. M.; Mader R. J.; McGowen R.; McLaughlin A.; Moraes M. A.; Nasr M. H.; Nociolo L. J.; O’Donnell L.; Parker A.; Peticolas J. L.; Pocina G.; Predescu C.; Quan T.; Salmon J. K.; Schwink C.; Shim K. S.; Siddique N.; Spengler J.; Szalay T.; Tabladillo R.; Tartler R.; Taube A. G.; Theobald M.; Towles B.; Vick W.; Wang S. C.; Wazlowski M.; Weingarten M. J.; Williams J. M.; Yuh K. A.. Anton 3: Twenty Microseconds of Molecular Dynamics Simulation before Lunch. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC ’21; Association for Computing Machinery: New York, NY, 2021; pp 1–11.

Perdomo-Ortiz A.; Benedetti M.; Realpe-Gómez J.; Biswas R. Opportunities and Challenges for Quantum-Assisted Machine Learning in near-Term Quantum Computers. Quantum Sci. Technol. 2018, 3 (3), 03050210.1088/2058-9565/aab859. DOI

Caro M. C.; Huang H.-Y.; Cerezo M.; Sharma K.; Sornborger A.; Cincio L.; Coles P. J. Generalization in Quantum Machine Learning from Few Training Data. Nat. Commun. 2022, 13, 4919.10.1038/s41467-022-32550-3. PubMed DOI PMC

Daley A. J.; Bloch I.; Kokail C.; Flannigan S.; Pearson N.; Troyer M.; Zoller P. Practical Quantum Advantage in Quantum Simulation. Nature 2022, 607 (7920), 667–676. 10.1038/s41586-022-04940-6. PubMed DOI

Ollitrault P. J.; Miessen A.; Tavernelli I. Molecular Quantum Dynamics: A Quantum Computing Perspective. Acc. Chem. Res. 2021, 54 (23), 4229–4238. 10.1021/acs.accounts.1c00514. PubMed DOI

Bender E. M.; Gebru T.; McMillan-Major A.; Shmitchell S.. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency FAccT ’21; Association for Computing Machinery:New York, NY, 2021; pp 610–623.

Patterson D.; Gonzalez J.; Holzle U.; Le Q.; Liang C.; Munguia L.-M.; Rothchild D.; So D. R.; Texier M.; Dean J. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer 2022, 55, 18.10.1109/MC.2022.3148714. DOI

Vinod R.; Chen P.-Y.; Das P. Reprogramming Pretrained Language Models for Protein Sequence Representation Learning. arXiv [cs.LG] 2023, 10.48550/arXiv.2301.02120. DOI

Caldararu O.; Blundell T. L.; Kepp K. P. Three Simple Properties Explain Protein Stability Change upon Mutation. J. Chem. Inf. Model. 2021, 61 (4), 1981–1988. 10.1021/acs.jcim.1c00201. PubMed DOI

Hu E. J.; Shen Y.; Wallis P.; Allen-Zhu Z.; Li Y.; Wang S.; Wang L.; Chen W.. LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations, April 25–29, 2022; OpenReview, 2022. https://openreview.net/forum?id=nZeVKeeFYf9

Taori R.; Gulrajani I.; Zhang T.; Dubois Y.; Li X.; Guestrin C.. Stanford Alpaca: An Instruction-Following Llama Model. 2023.

Yang A.; Miech A.; Sivic J.; Laptev I.; Schmid C.; Koyejo S.; Mohamed S.; Agarwal A.; Belgrave D.; Cho K.; Oh A. Zero-Shot Video Question Answering via Frozen Bidirectional Language Models. Adv. Neural Inf. Process. Syst. 2022, 35, 124–141.

Anstine D. M.; Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J. Am. Chem. Soc. 2023, 145 (16), 8736–8750. 10.1021/jacs.2c13467. PubMed DOI PMC

Popova M.; Isayev O.; Tropsha A. Deep Reinforcement Learning for de Novo Drug Design. Sci Adv 2018, 4 (7), eaap788510.1126/sciadv.aap7885. PubMed DOI PMC

Lutz I. D.; Wang S.; Norn C.; Courbet A.; Borst A. J.; Zhao Y. T.; Dosey A.; Cao L.; Xu J.; Leaf E. M.; Treichel C.; Litvicov P.; Li Z.; Goodson A. D.; Rivera-Sánchez P.; Bratovianu A.-M.; Baek M.; King N. P.; Ruohola-Baker H.; Baker D. Top-down Design of Protein Architectures with Reinforcement Learning. Science 2023, 380 (6642), 266–273. 10.1126/science.adf6591. PubMed DOI

Wang Y.; Tang H.; Huang L.; Pan L.; Yang L.; Yang H.; Mu F.; Yang M. Self-Play Reinforcement Learning Guides Protein Engineering. Nature Machine Intelligence 2023, 5 (8), 845–860. 10.1038/s42256-023-00691-9. DOI

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...