GASP: A Pan-Specific Predictor of Family 1 Glycosyltransferase Acceptor Specificity Enabled by a Pipeline for Substrate Feature Generation and Large-Scale Experimental Screening
Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection
Typ dokumentu časopisecké články
PubMed
38947828
PubMed Central
PMC11209901
DOI
10.1021/acsomega.4c01583
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity, and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the glycosyltransferase acceptor specificity predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. The performance was stable even in the case of completely new GT1s and acceptors not present in the training data set, highlighting the pan-specificity of GASP. Moreover, the model is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). To investigate the power of GASP, the model prediction probability scores were compared to GT1 substrate conversion yields from a newly published data set, with the top 50% of GASP predictions corresponding to reactions with >50% synthetic yields. The model was also tested in two comparative case studies: glycosylation of the antihelminth drug niclosamide and the plant defensive compound DIBOA. In the first study, the model achieved an 83% hit rate, outperforming a hit rate of 53% from a random selection assay. In the second case study, the hit rate of GASP was 50%, and while being lower than the hit rate of 83% using expert-selected enzymes, it provides a reasonable performance for the cases when an expert opinion is unavailable. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a GT1:acceptor predictor which can be trained on other data sets enabled by the automated feature generation pipelines. We also release the new in-house generated data set used for testing of GASP to facilitate the future development of GT1 activity predictors and their robust benchmarking.
Zobrazit více v PubMed
Nidetzky B.; Gutmann A.; Zhong C. Leloir Glycosyltransferases as Biocatalysts for Chemical Production. ACS Catal. 2018, 8 (7), 6283–6300. 10.1021/acscatal.8b00710. DOI
De Roode B. M.; Franssen M. C. R.; Van Der Padt A.; Boom R. M. Perspectives for the Industrial Enzymatic Production of Glycosides. Biotechnol. Prog. 2003, 19 (5), 1391–1402. 10.1021/bp030038q. PubMed DOI
Desmet T.; Soetaert W.; Bojarova P.; Kren V.; Dijkhuizen L.; Eastwick-Field V.; Schiller A. Enzymatic Glycosylation of Small Molecules: Challenging Substrates Require Tailored Catalysts. Chem.—Eur. J. 2012, 18 (35), 10786–10801. 10.1002/CHEM.201103069. PubMed DOI
Bowles D.; Isayenkova J.; Lim E. K.; Poppenberger B. Glycosyltransferases: Managers of Small Molecules. Curr. Opin Plant Biol. 2005, 8 (3), 254–263. 10.1016/j.pbi.2005.03.007. PubMed DOI
Lim E. K.; Ashford D. A.; Hou B.; Jackson R. G.; Bowles D. J. Arabidopsis Glycosyltransferases as Biocatalysts in Fermentation for Regioselective Synthesis of Diverse Quercetin Glucosides. Biotechnol. Bioeng. 2004, 87 (5), 623–631. 10.1002/bit.20154. PubMed DOI
Yang M.; Fehl C.; Lees K. V.; Lim E. K.; Offen W. A.; Davies G. J.; Bowles D. J.; Davidson M. G.; Roberts S. J.; Davis B. G. Functional and Informatics Analysis Enables Glycosyltransferase Activity Prediction. Nat. Chem. Biol. 2018, 14 (12), 1109–1117. 10.1038/s41589-018-0154-9. PubMed DOI
Drula E.; Garron M. L.; Dogan S.; Lombard V.; Henrissat B.; Terrapon N. The Carbohydrate-Active Enzyme Database: Functions and Literature. Nucleic Acids Res. 2022, 50 (D1), D571–D577. 10.1093/nar/gkab1045. PubMed DOI PMC
Bidart G. N.; Putkaradze N.; Fredslund F.; Kjeldsen C.; Ruiz A. G.; Duus J. Ø.; Teze D.; Welner D. H. Family 1 Glycosyltransferase UGT706F8 from Zea Mays Selectively Catalyzes the Synthesis of Silibinin 7-O-β-D-Glucoside. ACS Sustain Chem. Eng. 2022, 10, 5078.10.1021/acssuschemeng.1c07593. PubMed DOI PMC
Ross J.; Li Y.; Lim E.-K.; Bowles D. J. Higher Plant Glycosyltransferases. Genome Biol. 2001, 2 (2), 3004.1–3004.6. 10.1186/gb-2001-2-2-reviews3004. PubMed DOI PMC
Tegl G.; Nidetzky B. Leloir Glycosyltransferases of Natural Product C-Glycosylation: Structure, Mechanism and Specificity. Biochem. Soc. Trans. 2020, 48 (4), 1583–1598. 10.1042/BST20191140. PubMed DOI
Lairson L. L.; Henrissat B.; Davies G. J.; Withers S. G. Glycosyltransferases: Structures, Functions, and Mechanisms. Annu. Rev. Biochem. 2008, 77, 521–555. 10.1146/annurev.biochem.76.061005.092322. PubMed DOI
Teze D.; Coines J.; Fredslund F.; Dubey K. D.; Bidart G. N.; Adams P. D.; Dueber J. E.; Svensson B.; Rovira C.; Welner D. H. O-/N-/S-Specificity in Glycosyltransferase Catalysis: From Mechanistic Understanding to Engineering. ACS Catal. 2021, 11 (11), 1810–1815. 10.1021/acscatal.0c04171. DOI
He J.-B.; Zhao P.; Hu Z.-M.; Liu S.; Kuang Y.; Zhang M.; Li B.; Yun C.-H.; Qiao X.; Ye M. Molecular and Structural Characterization of a Promiscuous C-Glycosyltransferase from Trollius Chinensis. Angew. Chem., Int. Ed. 2019, 131 (131), 11637–11644. 10.1002/ange.201905505. PubMed DOI
Zhang L.; Wang D.; Zhang P.; Wu C.; Li Y. Promiscuity Characteristics of Versatile Plant Glycosyltransferases for Natural Product Glycodiversification. ACS Synth. Biol. 2022, 11 (11), 812–819. 10.1021/acssynbio.1c00489. PubMed DOI
Ross J.; Li Y.; Lim E.-K.; Bowles D. J. Higher Plant Glycosyltransferases. Genome Biology 2001 2:2 2001, 2 (2), 1–6. 10.1186/gb-2001-2-2-reviews3004. PubMed DOI PMC
Yang K. K.; Wu Z.; Arnold F. H. Machine-Learning-Guided Directed Evolution for Protein Engineering. Nat. Methods 2019, 16 (8), 687–694. 10.1038/s41592-019-0496-6. PubMed DOI
Mazurenko S.; Prokop Z.; Damborsky J. Machine Learning in Enzyme Engineering. ACS Catal. 2020, 10 (10), 1210–1223. 10.1021/acscatal.9b04321. DOI
Goldman S.; Das R.; Yang K. K.; Coley C. W. Machine Learning Modeling of Family Wide Enzyme-Substrate Specificity Screens. PloS Comput. Biol. 2022, 18 (2), e100985310.1371/journal.pcbi.1009853. PubMed DOI PMC
Robinson S. L.; Smith M. D.; Richman J. E.; Aukema K. G.; Wackett L. P. Machine Learning-Based Prediction of Activity and Substrate Specificity for OleA Enzymes in the Thiolase Superfamily. Synth Biol. 2020, 5 (1), 1.10.1093/synbio/ysaa004. DOI
Taujale R.; Venkat A.; Huang L. C.; Zhou Z.; Yeung W.; Rasheed K. M.; Li S.; Edison A. S.; Moremen K. W.; Kannan N. Elife 2020, 9, 1.10.7554/eLife.54532. PubMed DOI PMC
Robinson S. L.; Terlouw B. R.; Smith M. D.; Pidot S. J.; Stinear T. P.; Medema M. H.; Wackett L. P. Global Analysis of Adenylate-Forming Enzymes Reveals β-Lactone Biosynthesis Pathway in Pathogenic Nocardia. J. Biol. Chem. 2020, 295 (44), 14826–14839. 10.1074/jbc.RA120.013528. PubMed DOI PMC
Feehan R.; Montezano D.; Slusky J. S. G. Machine Learning for Enzyme Engineering, Selection and Design. Protein Eng Des Sel. 2021, 34, 1–10. 10.1093/PROTEIN/GZAB019. PubMed DOI PMC
Edgar R. C. MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Res. 2004, 32 (5), 1792–1797. 10.1093/nar/gkh340. PubMed DOI PMC
Szöcs E.; Stirling T.; Scott E. R.; Scharmüller A.; Schäfer R. B. Webchem: An R Package to Retrieve Chemical Information from the Web. J. Stat Softw 2020, 93, 1–17. 10.18637/jss.v093.i13. DOI
Axen S. D.; Huang X. P.; Cáceres E. L.; Gendelev L.; Roth B. L.; Keiser M. J. A Simple Representation of Three-Dimensional Molecular Structure. J. Med. Chem. 2017, 60 (17), 7393–7409. 10.1021/acs.jmedchem.7b00696. PubMed DOI PMC
Chen C. R.; Makhatadze G. I. ProteinVolume: Calculating Molecular van Der Waals and Void Volumes in Proteins. BMC Bioinformatics 2015, 16 (1), 1–6. 10.1186/s12859-015-0531-2. PubMed DOI PMC
Pedregosa F.; Michel V.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Vanderplas J.; Cournapeau D.; Pedregosa F.; Varoquaux G.; Gramfort A.; Thirion B.; Grisel O.; Dubourg V.; Passos A.; Brucher M.; Perrot M.; Duchesnay C9. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research 2011, 12 (85), 2825–2830.
de Boer R. M.; Vaitkus D.; Enemark-Rasmussen K.; Maschmann S.; Teze D.; Welner D. H. Regioselective Glycosylation of Polyphenols by Family 1 Glycosyltransferases: Experiments and Simulations. ACS Omega 2023, 8 (48), 46300–46308. 10.1021/acsomega.3c08255. PubMed DOI PMC
de Boer R. M.; Hvid D. E. H.; Davail E.; Vaitkus D.; Duus J. Ø.; Welner D. H.; Teze D. Promiscuous Yet Specific: A Methionine-Aromatic Interaction Drives the Reaction Scope of the Family 1 Glycosyltransferase GmUGT88E3 from Soybean. Biochemistry 2023, 62 (23), 3343–3346. 10.1021/acs.biochem.3c00494. PubMed DOI
Von Rad U.; Hüttl R.; Lottspeich F.; Gierl A.; Frey M. Two Glucosyltransferases Are Involved in Detoxification of Benzoxazinoids in Maize. Plant Journal 2001, 28 (6), 633–642. 10.1046/j.1365-313x.2001.01161.x. PubMed DOI
Hon J.; Marusiak M.; Martinek T.; Kunka A.; Zendulka J.; Bednar D.; Damborsky J. SoluProt: prediction of soluble protein expression in Escherichia coli. Bioinformatics 2021, 37 (1), 23–28. 10.1093/bioinformatics/btaa1102. PubMed DOI PMC
Frey M.; Schullehner K.; Dick R.; Fiesselmann A.; Gierl A. Benzoxazinoid Biosynthesis, a Model for Evolution of Secondary Metabolic Pathways in Plants. Phytochemistry 2009, 70 (15–16), 1645–1651. 10.1016/j.phytochem.2009.05.012. PubMed DOI
Willett C. D.; Lerch R. N.; Lin C. H.; Goyne K. W.; Leigh N. D.; Roberts C. A. Benzoxazinone-Mediated Triazine Degradation: A Proposed Reaction Mechanism. J. Agric. Food Chem. 2016, 64 (24), 4858–4865. 10.1021/acs.jafc.6b01017. PubMed DOI
Pearson R. D.; Hewlett E. L. Niclosamide Therapy for Tapeworm Infections. Ann. Int. Med. 1985, 102 (4), 550.10.7326/0003-4819-102-4-550. PubMed DOI
Needham D. The PH Dependence of Niclosamide Solubility, Dissolution, and Morphology: Motivation for Potentially Universal Mucin-Penetrating Nasal and Throat Sprays for COVID19, Its Variants and Other Viral Infections. Pharm. Res. 2022, 39 (1), 115–141. 10.1007/s11095-021-03112-x. PubMed DOI PMC
Brazier-Hicks M.; Offen W. A.; Gershater M. C.; Revett T. J.; Lim E. K.; Bowles D. J.; Davies G. J.; Edwards R. Characterization and Engineering of the Bifunctional N- and O-Glucosyltransferase Involved in Xenobiotic Metabolism in Plants. Proc. Natl. Acad. Sci. U. S. A. 2007, 104 (51), 20238–20243. 10.1073/pnas.0706421104. PubMed DOI PMC
Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596 (7873), 583–589. 10.1038/s41586-021-03819-2. PubMed DOI PMC
Schaller K. S.; Kari J.; Borch K.; Peters H. J.; Westh P.. Binding Prediction of Multi-Domain Cellulases with a Dual-CNN. arXiv; 02698v1 [physics. bio-ph], 2022, 10.48550/arXiv.2207.02698. DOI