Anticipating protein evolution with successor sequence predictor
Status PubMed-not-MEDLINE Language English Country England, Great Britain Media electronic
Document type Journal Article
Grant support
CZ.02.2.69/0.0/0.0/19_073/0016943
Grant Agency of Masaryk University
FIT-S-23-8209
Fakulta Informačních Technologií, Vysoké Učení Technické v Brně
LM2023069
Ministerstvo Školství, Mládeže a Tělovýchovy
857560
Horizon 2020
LX22NPO5102
European Union - Next Generation EU
PubMed
40119373
PubMed Central
PMC11927200
DOI
10.1186/s13321-025-00971-z
PII: 10.1186/s13321-025-00971-z
Knihovny.cz E-resources
- Keywords
- Activity, Adaptation, Evolution, Evolutionary trajectory, Protein design, Solubility, Thermostability,
- Publication type
- Journal Article MeSH
The quest to predict and understand protein evolution has been hindered by limitations on both the theoretical and the experimental fronts. Most existing theoretical models of evolution are descriptive, rather than predictive, leaving the final modifications in the hands of researchers. Existing experimental techniques to help probe the evolutionary sequence space of proteins, such as directed evolution, are resource-intensive and require specialised skills. We present the successor sequence predictor (SSP) as an innovative solution. Successor sequence predictor is an in silico protein design method that mimics laboratory-based protein evolution by reconstructing a protein's evolutionary history and suggesting future amino acid substitutions based on trends observed in that history through carefully selected physicochemical descriptors. This approach enhances specialised proteins by predicting mutations that improve desired properties, such as thermostability, activity, and solubility. Successor Sequence Predictor can thus be used as a general protein engineering tool to develop practically useful proteins. The code of the Successor Sequence Predictor is provided at https://github.com/loschmidt/successor-sequence-predictor , and the design of mutations will be also possible via an easy-to-use web server https://loschmidt.chemi.muni.cz/fireprotasr/ . SCIENTIFIC CONTRIBUTION: The Successor Sequence Predictor advances protein evolution prediction at the amino acid level by integrating ancestral sequence reconstruction with a novel in silico approach that models evolutionary trends through selected physicochemical descriptors. Unlike prior work, SSP can forecast future amino acid substitutions that enhance protein properties such as thermostability, activity, and solubility. This method reduces reliance on resource-intensive directed evolution techniques while providing a generalizable, predictive tool for protein engineering.
See more in PubMed
Hall BK, Hallgrimsson B (2014) Strickberger’s evolution. Jones And Bartlett.
Gillespie JH (1994) The causes of molecular evolution. Oxford University Press
Kimura M (1985) The neutral theory of molecular evolution. Cambridge University Press
Nosil P, Flaxman SM, Feder JL, Gompert Z (2020) Increasing our ability to predict contemporary evolution. Nat Commun 11(1) 10.1038/s41467-020-19437-x PubMed PMC
Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8(8):610–618. 10.1038/nrg2146 PubMed
Pál C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7(5):337–348. 10.1038/nrg1838 PubMed
Cano AV, Gitschlag BL, Rozhoňová H, Stoltzfus A, McCandlish DM, Payne JL (2023) Mutation bias and the predictability of evolution. Philos Trans R Soc B 378(1877):20220055. 10.1098/rstb.2022.0055 PubMed PMC
Yang KK, Wu Z, Arnold FH (2019) Machine-learning-guided directed evolution for protein engineering. Nat Methods 16:687–694. 10.1038/s41592-019-0496-6 PubMed
Arnold FH (2019) Innovation by evolution: bringing new chemistry to life (nobel lecture). Angew Chem 58(41):14420–14426. 10.1002/anie.201907729 PubMed
Fay JC, Wu CI (2000) Hitchhiking under positive Darwinian selection. Genetics 155(3):1405–1413. 10.1093/genetics/155.3.1405 PubMed PMC
Raven SA, Payne B, Mitchell AF, Rackham O (2022) In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold. Nat Chem Biol 18(4):403–411. 10.1038/s41589-022-00967-y PubMed
Kawashima S (2000) AAindex: amino acid index database. Nucleic Acids Res 28(1):374–374. 10.1093/nar/28.1.374 PubMed PMC
Livada J, Vargas AM, Martinez CA, Lewis RD (2023) Ancestral sequence reconstruction enhances gene mining efforts for industrial ene reductases by expanding enzyme panels with thermostable catalysts. ACS Catal 13(4):2576–2585. 10.1021/acscatal.2c03859
Musil M, Khan RT, Beier A, Stourac J, Konegger H, Damborsky J, Bednar D (2020) FireProtASR: a web server for fully automated ancestral sequence reconstruction. Brief Bioinform. 10.1093/bib/bbaa337 PubMed PMC
Prakinee K, Phaisan S, Kongjaroon S, Chaiyen P (2024) Ancestral sequence reconstruction for designing biocatalysts and investigating their functional mechanisms. JACS Au 4(12):4571–4591. 10.1021/jacsau.4c00653 PubMed PMC
Gumulya Y, Huang W, D’Cunha SA, Richards KE, Thomson RE, Hunter DJ, Baek JM, Harris KL, Boden M, De Voss JJ, Hayes MA (2019) Engineering thermostable CYP2D enzymes for biocatalysis using combinatorial libraries of ancestors for directed evolution (CLADE). ChemCatChem 11(2):841–850. 10.1002/cctc.201801644
Gumulya Y, Baek JM, Wun SJ, Thomson RE, Harris KL, Hunter DJ, Behrendorff JB, Kulig J, Zheng S, Wu X, Wu B (2018) Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nat Catal 1(11):878–888. 10.1038/s41929-018-0159-5
Brennan CK, Livada J, Martinez CA, Lewis RD (2024) Ancestral sequence reconstruction meets machine learning: ene reductase thermostabilization yields enzymes with improved reactivity profiles. ACS Catal 14(23):17893–17900. 10.1021/acscatal.4c03738
Slodkowicz G, Goldman N (2020) Integrated structural and evolutionary analysis reveals common mechanisms underlying adaptive evolution in mammals. Proc Natl Acad Sci 117(11):5977–5986. 10.1073/pnas.1916786117 PubMed PMC
Fasman GD (1989) Practical handbook of biochemistry and molecular biology. CRC Press, New York. 10.1201/9781351072427/handbook-biochemistry-gerald-fasman
Goldsack DE, Chalifoux RC (1973) Contribution of the free energy of mixing of hydrophobic side chains to the stability of the tertiary structure of proteins. J Theor Biol 39(3):645–651. 10.1016/0022-5193(73)90075-1 PubMed
Wolfenden RV, Cullis PM, Southgate CCF (1979) Water, protein folding, and the genetic code. Science 206(4418):575–577. 10.1126/science.493962 PubMed
Bhaskran R, Ponnuswamy PK (1988) Positional flexibilities of amino acid residues in globular proteins. Int J Pept Protein Res 32(4):241–255. 10.1111/j.1399-3011.1988.tb01258.x PubMed
Bull HB, Breese K (1974) Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues. Arch Biochem Biophys 161(2):665–670. 10.1016/0003-9861(74)90352-x PubMed
Fauchere J-L, Charton M, Kier LB, Verloop A, Pliska V (2009) Amino acid side chain parameters for correlation studies in biology and pharmacology. Int J Pept Protein Res 32(4):269–278. 10.1111/j.1399-3011.1988.tb01261.x PubMed
Zimmerman JM, Eliezer N, Simha R (1968) The characterization of amino acid sequences in proteins by statistical methods. J Theor Biol 21(2):170–201. 10.1016/0022-5193(68)90069-6 PubMed
Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36(1):5–9. 10.1093/nar/gkn201 PubMed PMC
Liu Y, Hayes DN, Nobel A, Marron JS (2008) Statistical significance of clustering for high-dimension, low-sample size data. J Am Stat Assoc 103(483):1281–1293
Sievers F, Higgins DG (2014) Clustal omega. Curr Protoc Bioinformat. 48(1) 10.1002/0471250953.bi0313s48 PubMed
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313. 10.1093/bioinformatics/btu033 PubMed PMC
Tria F, Landan G, Dagan T (2017) Phylogenetic rooting using minimal ancestor deviation. Nat Ecol Evol 1:0193. 10.1038/s41559-017-0193 PubMed
Hanson-Smith V, Kolaczkowski B, Thornton JW (2010) Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol Biol Evol 27(9):1988–1999. 10.1093/molbev/msq081 PubMed PMC
sklearn.linear_model.LinearRegression (2023) Scikit-Learn. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
Klesmith JR, Bacik J-P, Wrenbeck EE, Michalczyk R, Whitehead TA (2017) Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc Natl Acad Sci 114(9):2265–2270. 10.1073/pnas.1614437114 PubMed PMC
Gribenko AV, Makhatadze GI (2007) Role of the charge-charge interactions in defining stability and halophilicity of the CspB proteins. J Mol Biol 366(3):842–856. 10.1016/j.jmb.2006.11.061 PubMed
Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS (2014) Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res 42(14):e112–e112. 10.1093/nar/gku511 PubMed PMC
Velecký J, Hamsikova M, Stourac J, Musil M, Damborsky J, Bednar D, Mazurenko S (2022) SoluProtMutDB: a manually curated database of protein solubility changes upon mutations. Comput Struct Biotechnol J 20:6339–6347. 10.1016/j.csbj.2022.11.009 PubMed PMC
Stourac J, Dubrava J, Musil M, Horackova J, Damborsky J, Mazurenko S, Bednar D (2020) FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 49(D1):D319–D324. 10.1093/nar/gkaa981 PubMed PMC
Spence MA, Kaczmarski JA, Saunders JW, Jackson CJ (2021) Ancestral sequence reconstruction for protein engineers. Curr Opin Struct Biol 69:131–141. 10.1016/j.sbi.2021.04.001 PubMed
Cai W, Pei J, Grishin NV (2004) Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol 4:1–23. 10.1186/1471-2148-4-33 PubMed PMC
Babkova P, Sebestova E, Brezovsky J, Chaloupkova R, Damborsky J (2017) Ancestral haloalkane dehalogenases show robustness and unique substrate specificity. ChemBioChem 18(14):1448–1456. 10.1002/cbic.201700197 PubMed
Risso VA, Sanchez-Ruiz JM (2017) Resurrected ancestral proteins as scaffolds for protein engineering. In: Directed enzyme evolution: advances and applications. pp. 229–255. 10.1007/978-3-319-50413-1_9
Thomson RE, Carrera-Pacheco SE, Gillam EM (2022) Engineering functional thermostable proteins using ancestral sequence reconstruction. J Biol Chem 298(10):102435. 10.1016/j.jbc.2022.102435 PubMed PMC
Kohout P, Vasina M, Majerova M, Novakova V, Damborsky J, Bednar D, Marek M, Prokop Z, Mazurenko S (2025). Engineering Dehalogenase Enzymes Using Variational Autoencoder-Generated Latent Spaces and Microfluidics. JACS Au. 10.1021/jacsau.4c01101. PubMed PMC