Design of proteins by parallel tempering in the sequence space
Jazyk angličtina Země Spojené státy americké Médium print
Typ dokumentu časopisecké články
Grantová podpora
LM2023055
Ministerstvo Školství, Mládeže a Tělovýchovy
LUC 24136
Ministerstvo Školství, Mládeže a Tělovýchovy
CA21160
European Cooperation in Science and Technology
ML4NGP
European Cooperation in Science and Technology
PubMed
40990840
PubMed Central
PMC12459223
DOI
10.1002/pro.70246
Knihovny.cz E-zdroje
- Klíčová slova
- ESMfold, Monte Carlo, machine learning, parallel tempering, protein design, replica exchange,
- MeSH
- algoritmy * MeSH
- hydrofobní a hydrofilní interakce MeSH
- konformace proteinů MeSH
- metoda Monte Carlo MeSH
- molekulární modely MeSH
- proteinové inženýrství * metody MeSH
- proteiny * chemie genetika MeSH
- sekvence aminokyselin MeSH
- termodynamika MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- proteiny * MeSH
Computational design of new proteins is often performed by optimizing the amino acid sequence. This sequence is characterized by an energy (lower energy means better propensity to form the desired 3D structure) that is sampled and minimized. Here, we use the parallel tempering algorithm to accelerate this task. ESMfold was used to predict the structures of the sampled proteins and calculate energy. Starting from random amino acid sequences, each sequence was sampled using the Monte Carlo method at one of a series of temperatures, and these replicas were being exchanged by the parallel tempering method. A series of 100 or 200 residue proteins was designed to maximize confidence in structure prediction and globularity and minimize surface hydrophobic residues. We show that parallel tempering is a viable alternative to Monte Carlo sampling without replica exchanges and simulated annealing or related energy-based protein design methods, especially in the situation where a continuous flow of designed sequences is desired.
Zobrazit více v PubMed
Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science. 2022;378(6615):49–56. PubMed PMC
Earl DJ, Deem MW. Parallel tempering: theory, applications, and new perspectives. Phys Chem Chem Phys. 2005;7:3910–3916. PubMed
Frank C, Khoshouei A, Fuβ L, Schiwietz D, Putz D, Weber L, et al. Scalable protein design using optimization in a relaxed sequence space. Science. 2024;386(6720):439–445. PubMed PMC
Goverde CA, Wolf B, Khakzad H, Rosset S, Correia BE. De novo protein design by inversion of the AlphaFold structure prediction network. Protein Sci. 2023;32(6):e4653. PubMed PMC
Hie B, Candido S, Lin Z, Kabeli O, Rao R, Smetanin N, et al. A high‐level programming language for generative protein design. bioRxiv. 2022, 10.1101/2022.12.21.521526. DOI
Humphrey W, Dalke A, Schulten K. VMD – visual molecular dynamics. J Mol Graph. 1996;14:33–38. PubMed
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. PubMed PMC
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary‐scale prediction of atomic‐level protein structure with a language model. Science. 2023;379(6637):1123–1130. PubMed
Lisanza SL, Gershon JM, Tipps SWK, Sims JN, Arnoldt L, Hendel SJ, et al. Multistate and functional protein design using RoseTTAFold sequence space diffusion. Nat Biotechnol. 2024, 10.1038/s41587-024-02395-w. PubMed DOI PMC
Listov D, Goverde CA, Correia BE, Fleishman SJ. Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol. 2024;25:639–653. PubMed PMC
Liu Y, Kuhlman B. RosettaDesign server for protein design. Nucleic Acids Res. 2006;34(Suppl 2):W235–W238. PubMed PMC
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21(6):1087–1092.
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci USA. 2021;118(15):e2016239118. PubMed PMC
Spiwok V, Sucur Z, Hosek P. Enhanced sampling techniques in biomolecular simulations. Biotech Adv. 2015;33(6, Part 2):1130–1140. BioTech 2014 and 6th Czech‐Swiss Biotechnology Symposium. PubMed
Swendsen RH, Wang JS. Replica Monte Carlo simulation of spin‐glasses. Phys Rev Lett. 1986;57:2607–2609. PubMed
van der Maaten L, Hinton G. Visualizing Data using t‐SNE. J Mach Learn Res. 2008;9:2579–2605.
Verkuil R, Kabeli O, Du Y, Wicky BIM, Milles LF, Dauparas J, et al. Language models generalize beyond natural proteins. bioRxiv. 2022, 10.1101/2022.12.21.521521. DOI
Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620:1476–4687. PubMed PMC
Wicky BIM, Milles LF, Courbet A, Ragotte RJ, Dauparas J, Kinfu E, et al. Hallucinating symmetric protein assemblies. Science. 2022;378(6615):56–61. PubMed PMC
Yamamoto R, Kob W. Replica‐exchange molecular dynamics simulation for supercooled liquids. Phys Rev E. 2000;61:5473–5476. PubMed