• This record comes from PubMed

Anticipating protein evolution with successor sequence predictor

. 2025 Mar 21 ; 17 (1) : 34. [epub] 20250321

Status PubMed-not-MEDLINE Language English Country England, Great Britain Media electronic

Document type Journal Article

Grant support
CZ.02.2.69/0.0/0.0/19_073/0016943 Grant Agency of Masaryk University
FIT-S-23-8209 Fakulta Informačních Technologií, Vysoké Učení Technické v Brně
LM2023069 Ministerstvo Školství, Mládeže a Tělovýchovy
857560 Horizon 2020
LX22NPO5102 European Union - Next Generation EU

Links

PubMed 40119373
PubMed Central PMC11927200
DOI 10.1186/s13321-025-00971-z
PII: 10.1186/s13321-025-00971-z
Knihovny.cz E-resources

The quest to predict and understand protein evolution has been hindered by limitations on both the theoretical and the experimental fronts. Most existing theoretical models of evolution are descriptive, rather than predictive, leaving the final modifications in the hands of researchers. Existing experimental techniques to help probe the evolutionary sequence space of proteins, such as directed evolution, are resource-intensive and require specialised skills. We present the successor sequence predictor (SSP) as an innovative solution. Successor sequence predictor is an in silico protein design method that mimics laboratory-based protein evolution by reconstructing a protein's evolutionary history and suggesting future amino acid substitutions based on trends observed in that history through carefully selected physicochemical descriptors. This approach enhances specialised proteins by predicting mutations that improve desired properties, such as thermostability, activity, and solubility. Successor Sequence Predictor can thus be used as a general protein engineering tool to develop practically useful proteins. The code of the Successor Sequence Predictor is provided at https://github.com/loschmidt/successor-sequence-predictor , and the design of mutations will be also possible via an easy-to-use web server https://loschmidt.chemi.muni.cz/fireprotasr/ . SCIENTIFIC CONTRIBUTION: The Successor Sequence Predictor advances protein evolution prediction at the amino acid level by integrating ancestral sequence reconstruction with a novel in silico approach that models evolutionary trends through selected physicochemical descriptors. Unlike prior work, SSP can forecast future amino acid substitutions that enhance protein properties such as thermostability, activity, and solubility. This method reduces reliance on resource-intensive directed evolution techniques while providing a generalizable, predictive tool for protein engineering.

See more in PubMed

Hall BK, Hallgrimsson B (2014) Strickberger’s evolution. Jones And Bartlett.

Gillespie JH (1994) The causes of molecular evolution. Oxford University Press

Kimura M (1985) The neutral theory of molecular evolution. Cambridge University Press

Nosil P, Flaxman SM, Feder JL, Gompert Z (2020) Increasing our ability to predict contemporary evolution. Nat Commun 11(1) 10.1038/s41467-020-19437-x PubMed PMC

Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new mutations. Nat Rev Genet 8(8):610–618. 10.1038/nrg2146 PubMed

Pál C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7(5):337–348. 10.1038/nrg1838 PubMed

Cano AV, Gitschlag BL, Rozhoňová H, Stoltzfus A, McCandlish DM, Payne JL (2023) Mutation bias and the predictability of evolution. Philos Trans R Soc B 378(1877):20220055. 10.1098/rstb.2022.0055 PubMed PMC

Yang KK, Wu Z, Arnold FH (2019) Machine-learning-guided directed evolution for protein engineering. Nat Methods 16:687–694. 10.1038/s41592-019-0496-6 PubMed

Arnold FH (2019) Innovation by evolution: bringing new chemistry to life (nobel lecture). Angew Chem 58(41):14420–14426. 10.1002/anie.201907729 PubMed

Fay JC, Wu CI (2000) Hitchhiking under positive Darwinian selection. Genetics 155(3):1405–1413. 10.1093/genetics/155.3.1405 PubMed PMC

Raven SA, Payne B, Mitchell AF, Rackham O (2022) In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold. Nat Chem Biol 18(4):403–411. 10.1038/s41589-022-00967-y PubMed

Kawashima S (2000) AAindex: amino acid index database. Nucleic Acids Res 28(1):374–374. 10.1093/nar/28.1.374 PubMed PMC

Livada J, Vargas AM, Martinez CA, Lewis RD (2023) Ancestral sequence reconstruction enhances gene mining efforts for industrial ene reductases by expanding enzyme panels with thermostable catalysts. ACS Catal 13(4):2576–2585. 10.1021/acscatal.2c03859

Musil M, Khan RT, Beier A, Stourac J, Konegger H, Damborsky J, Bednar D (2020) FireProtASR: a web server for fully automated ancestral sequence reconstruction. Brief Bioinform. 10.1093/bib/bbaa337 PubMed PMC

Prakinee K, Phaisan S, Kongjaroon S, Chaiyen P (2024) Ancestral sequence reconstruction for designing biocatalysts and investigating their functional mechanisms. JACS Au 4(12):4571–4591. 10.1021/jacsau.4c00653 PubMed PMC

Gumulya Y, Huang W, D’Cunha SA, Richards KE, Thomson RE, Hunter DJ, Baek JM, Harris KL, Boden M, De Voss JJ, Hayes MA (2019) Engineering thermostable CYP2D enzymes for biocatalysis using combinatorial libraries of ancestors for directed evolution (CLADE). ChemCatChem 11(2):841–850. 10.1002/cctc.201801644

Gumulya Y, Baek JM, Wun SJ, Thomson RE, Harris KL, Hunter DJ, Behrendorff JB, Kulig J, Zheng S, Wu X, Wu B (2018) Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nat Catal 1(11):878–888. 10.1038/s41929-018-0159-5

Brennan CK, Livada J, Martinez CA, Lewis RD (2024) Ancestral sequence reconstruction meets machine learning: ene reductase thermostabilization yields enzymes with improved reactivity profiles. ACS Catal 14(23):17893–17900. 10.1021/acscatal.4c03738

Slodkowicz G, Goldman N (2020) Integrated structural and evolutionary analysis reveals common mechanisms underlying adaptive evolution in mammals. Proc Natl Acad Sci 117(11):5977–5986. 10.1073/pnas.1916786117 PubMed PMC

Fasman GD (1989) Practical handbook of biochemistry and molecular biology. CRC Press, New York. 10.1201/9781351072427/handbook-biochemistry-gerald-fasman

Goldsack DE, Chalifoux RC (1973) Contribution of the free energy of mixing of hydrophobic side chains to the stability of the tertiary structure of proteins. J Theor Biol 39(3):645–651. 10.1016/0022-5193(73)90075-1 PubMed

Wolfenden RV, Cullis PM, Southgate CCF (1979) Water, protein folding, and the genetic code. Science 206(4418):575–577. 10.1126/science.493962 PubMed

Bhaskran R, Ponnuswamy PK (1988) Positional flexibilities of amino acid residues in globular proteins. Int J Pept Protein Res 32(4):241–255. 10.1111/j.1399-3011.1988.tb01258.x PubMed

Bull HB, Breese K (1974) Surface tension of amino acid solutions: a hydrophobicity scale of the amino acid residues. Arch Biochem Biophys 161(2):665–670. 10.1016/0003-9861(74)90352-x PubMed

Fauchere J-L, Charton M, Kier LB, Verloop A, Pliska V (2009) Amino acid side chain parameters for correlation studies in biology and pharmacology. Int J Pept Protein Res 32(4):269–278. 10.1111/j.1399-3011.1988.tb01261.x PubMed

Zimmerman JM, Eliezer N, Simha R (1968) The characterization of amino acid sequences in proteins by statistical methods. J Theor Biol 21(2):170–201. 10.1016/0022-5193(68)90069-6 PubMed

Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL (2008) NCBI BLAST: a better web interface. Nucleic Acids Res 36(1):5–9. 10.1093/nar/gkn201 PubMed PMC

Liu Y, Hayes DN, Nobel A, Marron JS (2008) Statistical significance of clustering for high-dimension, low-sample size data. J Am Stat Assoc 103(483):1281–1293

Sievers F, Higgins DG (2014) Clustal omega. Curr Protoc Bioinformat. 48(1) 10.1002/0471250953.bi0313s48 PubMed

Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313. 10.1093/bioinformatics/btu033 PubMed PMC

Tria F, Landan G, Dagan T (2017) Phylogenetic rooting using minimal ancestor deviation. Nat Ecol Evol 1:0193. 10.1038/s41559-017-0193 PubMed

Hanson-Smith V, Kolaczkowski B, Thornton JW (2010) Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol Biol Evol 27(9):1988–1999. 10.1093/molbev/msq081 PubMed PMC

sklearn.linear_model.LinearRegression (2023) Scikit-Learn. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

Klesmith JR, Bacik J-P, Wrenbeck EE, Michalczyk R, Whitehead TA (2017) Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc Natl Acad Sci 114(9):2265–2270. 10.1073/pnas.1614437114 PubMed PMC

Gribenko AV, Makhatadze GI (2007) Role of the charge-charge interactions in defining stability and halophilicity of the CspB proteins. J Mol Biol 366(3):842–856. 10.1016/j.jmb.2006.11.061 PubMed

Melnikov A, Rogov P, Wang L, Gnirke A, Mikkelsen TS (2014) Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res 42(14):e112–e112. 10.1093/nar/gku511 PubMed PMC

Velecký J, Hamsikova M, Stourac J, Musil M, Damborsky J, Bednar D, Mazurenko S (2022) SoluProtMutDB: a manually curated database of protein solubility changes upon mutations. Comput Struct Biotechnol J 20:6339–6347. 10.1016/j.csbj.2022.11.009 PubMed PMC

Stourac J, Dubrava J, Musil M, Horackova J, Damborsky J, Mazurenko S, Bednar D (2020) FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 49(D1):D319–D324. 10.1093/nar/gkaa981 PubMed PMC

Spence MA, Kaczmarski JA, Saunders JW, Jackson CJ (2021) Ancestral sequence reconstruction for protein engineers. Curr Opin Struct Biol 69:131–141. 10.1016/j.sbi.2021.04.001 PubMed

Cai W, Pei J, Grishin NV (2004) Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol 4:1–23. 10.1186/1471-2148-4-33 PubMed PMC

Babkova P, Sebestova E, Brezovsky J, Chaloupkova R, Damborsky J (2017) Ancestral haloalkane dehalogenases show robustness and unique substrate specificity. ChemBioChem 18(14):1448–1456. 10.1002/cbic.201700197 PubMed

Risso VA, Sanchez-Ruiz JM (2017) Resurrected ancestral proteins as scaffolds for protein engineering. In: Directed enzyme evolution: advances and applications. pp. 229–255. 10.1007/978-3-319-50413-1_9

Thomson RE, Carrera-Pacheco SE, Gillam EM (2022) Engineering functional thermostable proteins using ancestral sequence reconstruction. J Biol Chem 298(10):102435. 10.1016/j.jbc.2022.102435 PubMed PMC

Kohout P, Vasina M, Majerova M, Novakova V, Damborsky J, Bednar D, Marek M, Prokop Z, Mazurenko S (2025). Engineering Dehalogenase Enzymes Using Variational Autoencoder-Generated Latent Spaces and Microfluidics. JACS Au. 10.1021/jacsau.4c01101. PubMed PMC

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...