Generate what you can make: achieving in-house synthesizability with readily available resources in de novo drug design

. 2025 Mar 28 ; 17 (1) : 41. [epub] 20250328

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid40155970

Grantová podpora
956832 Horizon 2020 Framework Programme
22-17367O Czech Science Foundation Grant
LM2023052 Ministry of Education, Youth and Sports of the Czech Republic
NWO ENPPS.LIFT.019.010 Dutch Research Council (NWO)

Odkazy

PubMed 40155970
PubMed Central PMC11954305
DOI 10.1186/s13321-024-00910-4
PII: 10.1186/s13321-024-00910-4
Knihovny.cz E-zdroje

Computer-Aided Synthesis Planning (CASP) and CASP-based approximated synthesizability scores have rarely been used as generation objectives in Computer-Aided Drug Design despite facilitating the in-silico generation of synthesizable molecules. However, these synthesizability approaches are disconnected from the reality of small laboratory drug design, where building block resources are limited, thus making the notion of in-house synthesizability with already available resources highly desirable. In this work, we show a successful in-house de novo drug design workflow generating active and in-house synthesizable ligands of monoglyceride lipase (MGLL). First, we demonstrate the successful transfer of CASP from 17.4 million commercial building blocks to a small laboratory setting of roughly 6000 building blocks with only a decrease of -12% in CASP success when accepting two reaction-steps longer synthesis routes on average. Next, we present a rapidly retrainable in-house synthesizability score, successfully capturing our in-house synthesizability without relying on external building block resources. We show that including our in-house synthesizability score in a multi-objective de novo drug design workflow, alongside a simple QSAR model, provides thousands of potentially active and easily in-house synthesizable molecules. Finally, we experimentally evaluate the synthesis and biochemical activity of three de novo candidates using their CASP-suggested synthesis routes employing only in-house building blocks. We find one candidate with evident activity, suggesting potential new ligand ideas for MGLL inhibitors while showcasing the usefulness of our in-house synthesizability score for de novo drug design.Scientific contribution Our core scientific contribution is the introduction of in-house de novo drug design, which enables the practical application of generative methods in small laboratories by utilizing a limited stock of available building blocks. Our fast-to-adapt workflow for in-house synthesizability scoring requires minimal computational retraining costs while supporting a high diversity of generated structures. We highlight the practicality of our approach through a comprehensive in-vitro case study that relies entirely on in-house resources, including in-silico generation, synthesis planning, and activity evaluation.

Zobrazit více v PubMed

Vijayan RSK, Kihlberg J, Cross JB, Poongavanam V (2022) Enhancing preclinical drug discovery with artificial intelligence. Drug Discov Today 27(4):967–984. 10.1016/j.drudis.2021.11.023 PubMed

Moret M, Pachon Angona I, Cotos L, Yan S, Atz K, Brunner C, Baumgartner M, Grisoni F, Schneider G (2023) Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat Communicat 14(1):114. 10.1038/s41467-022-35692-6 PubMed PMC

Ballarotto M, Willems S, Stiller T, Nawa F, Marschner JA, Grisoni F, Merk D (2023) De novo design of Nurr1 agonists via fragment-augmented generative deep learning in low-data regime. J Med Chem 66(12):8170–8177. 10.1021/acs.jmedchem.3c00485 PubMed PMC

Stanley M, Segler M (2023) Fake it until you make it? Generative de novo design and virtual screening of synthesizable molecules. Curr Opin Struct Bio 82:102658. 10.1016/j.sbi.2023.102658 PubMed

Anstine DM, Isayev O (2023) Generative models as an emerging paradigm in the chemical sciences. J Am Chem Soc 145(16):8736–8750. 10.1021/jacs.2c13467 PubMed PMC

Nicolaou CA, Brown N (2013) Multi-objective optimization methods in drug design. Drug Discov Today Technol 10(3):427–435. 10.1016/j.ddtec.2013.02.001 PubMed

Segler MHS, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555(7698):604–610. 10.1038/nature25978 PubMed

Corey EJ, XqM Cheng (1989) The logic of chemical synthesis. John Wiley & Sons Ltd, New York

Schwaller P, Vaucher AC, Laplaza R, Bunne C, Krause A, Corminboeuf C, Laino T (2022) Machine intelligence for chemical reaction space. WIREs Computational Molecular Science 12(5):1604. 10.1002/wcms.1604

Gao W, Coley CW (2020) The synthesizability of molecules proposed by generative models. J Chem Informat Modeli 60(12):5714–5723. 10.1021/acs.jcim.0c00174 PubMed

Genheden S, Thakkar A, Chadimová V, Reymond JL, Engkvist O, Bjerrum E (2020) AiZynthFinder: A fast, robust and flexible open-source software for retrosynthetic planning. J Cheminformat 12(1):70. 10.1186/s13321-020-00472-1 PubMed PMC

Chen B, Li C, Dai H, Song L (2020) Retro*: learning retrosynthetic planning with neural guided a* search. In: III HD, Singh A (eds) proceedings of the 37th International conference on machine learning. proceedings of machine learning research, vol. 119, pp 1608–1616. PMLR, Virtual

Yu Y, Wei Y, Kuang K, Huang Z, Yao H, Wu F, Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A (2022) GRASP: navigating retrosynthetic planning with goal-driven policy. advances in neural information processing systems, vol 35. Curran Associates Inc, New Orleans, Louisiana, USA, pp 10257–10268

Parrot M, Tajmouati H, da Silva VBR, Atwood BR, Fourcade R, Gaston-Mathé Y, Do Huu N, Perron Q (2023) Integrating synthetic accessibility with AI-based generative drug design. J Cheminformat 15(1):83. 10.1186/s13321-023-00742-8 PubMed PMC

Hassen AK, Torren-Peraire P, Genheden S, Verhoeven J, Preuss M, Tetko I (2022) Mind the Retrosynthesis Gap: bridging the divide between single-step and multi-step retrosynthesis prediction. In: NeurIPS 2022 AI for science: progress and promises

Torren Peraire P, Hassen AK, Genheden S, Verhoeven J, DqA Clevert, Preuss M, Tetko IV (2024) Models matter: the impact of single-step retrosynthesis on synthesis planning. Digital Discov. 10.1039/D3DD00252G

Urbina F, Lowden CT, Culberson JC, Ekins S (2022) MegaSyn: integrating generative molecular design, automated analog designer, and synthetic viability prediction. ACS Omega 7(22):18699–18713. 10.1021/acsomega.2c01404 PubMed PMC

Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminformat 1(1):8. 10.1186/1758-2946-1-8 PubMed PMC

Liu X, Ye K, van Vlijmen HWT, IJzerman AP, van Westen GJP (2023) DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J Cheminformat 15(1):24. 10.1186/s13321-023-00694-z PubMed PMC

Thakkar A, Chadimová V, Bjerrum EJ, Engkvist O, Reymond JL (2021) Retrosynthetic accessibility score (RAscore)-rapid machine learned synthesizability classification from AI driven retrosynthetic planning. Chem. Sci. 12(9):3339–3349. 10.1039/D0SC05401A PubMed PMC

Yu J, Wang J, Zhao H, Gao J, Kang Y, Cao D, Wang Z, Hou T (2022) Organic compound synthetic accessibility prediction based on the graph attention mechanism J Chem Informat Model 62(12):2973–2986. 10.1021/acs.jcim.2c00038 PubMed

CqH Liu, Korablyov M, Jastrzebski S, Włodarczyk-Pruszyński P, Bengio Y, Segler M (2022) RetroGNN: fast estimation of synthesizability for virtual screening and de novo design by learning from slow retrosynthesis software. J Chem Informat Model 62(10):2293–2300. 10.1021/acs.jcim.1c01476 PubMed

Brown N, Fiscato M, Segler MHS, Vaucher AC (2019) GuacaMol: benchmarking models for de novo molecular design. J Chem Informat Model 59(3):1096–1108. 10.1021/acs.jcim.8b00839 PubMed

Luukkonen S, Van Den Maagdenberg HW, Emmerich MTM, Van Westen GJP (2023) Artificial intelligence in multi-objective drug design. Curr Opin Struct Bio 79:102537. 10.1016/j.sbi.2023.102537 PubMed

The UniProt Consortium Q99685 | MGLL | Monoglyceride Lipase | Homo Sapiens (Human) | UniProt (2023). https://www.uniprot.org/uniprotkb/Q99685/entry Accessed 24 Oct 2023

Béquignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP (2023) Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminformat 15(1):3. 10.1186/s13321-022-00672-x PubMed PMC

Butina D (1999) Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: a fast and automated way to cluster small and large data sets. J Chem Informat Comput Sci 39(4):747–750. 10.1021/ci9803381

Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Félix E, Magariños MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Marañón M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR (2018) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47(D1):930–940. 10.1093/nar/gky1075 PubMed PMC

Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp 785–794. Association for Computing Machinery, San Francisco, California, USA. 10.1145/2939672.2939785

Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Struct 405(2):442–451. 10.1016/0005-2795(75)90109-9 PubMed

Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):6. 10.1186/s12864-019-6413-7 PubMed PMC

McInnes L, Healy J, Saul N, Großberger L (2018) UMAP: uniform manifold approximation and projection. J Open Source Softw 3(29):861. 10.21105/joss.00861

Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settingsq. Adv Drug Deliv Rev 23(1):3–25. 10.1016/S0169-409X(96)00423-1 PubMed

Papadopoulos K, Giblin KA, Janet JP, Patronov A, Engkvist O (2021) De novo design with deep generative models based on 3D similarity scoring. Bioorganic Med Chem 44:116308. 10.1016/j.bmc.2021.116308 PubMed

Eberhardt J, Santos-Martins D, Tillack AF, Forli S (2021) AutoDock Vina 1.2.0: new docking methods, expanded force field, and python bindings. J Chem Informat Model 61(8):3891–3898. 10.1021/acs.jcim.1c00203 PubMed PMC

Corso G, Stark H, Jing B, Barzilay R, Jaakkola TS (2023) DiffDock: diffusion steps, twists, and turns for molecular docking. in: the eleventh international conference on learning representations

Saigiridharan L, Hassen AK, Lai H, Torren-Peraire P, Engkvist O, Genheden S (2024) AiZynthFinder 4.0: developments based on learnings from 3 years of industrial application. J Cheminformat 16(1):57. 10.1186/s13321-024-00860-x PubMed PMC

Segler MHS, Waller MP (2017) Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem A Eur J 23(25):5966–5971. 10.1002/chem.201605499 PubMed

Lowe DM (2012) Extraction of chemical structures and reactions from the literature. University of Cambridge, Thesis

Universiteit Leiden Leiden Early Drug Discovery & Development (2023). https://www.universiteitleiden.nl/en/science/led3 Accessed 25 Oct 2023

Enamine Ltd. enamine building blocks catalog (2023). https://enamine.net/building-blocks/building-blocks-catalog Accessed 15 May 2023

Molport SIA molport compound sourcing, selling and purchasing platform (2023). https://www.molport.com/shop/index Accessed 15 May 2023

eMolecules, Inc. eMolecules chemical building blocks (2023). https://www.emolecules.com/products/building-blocks Accessed 15 May 2023

Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Informat Model 50(5):742–754. 10.1021/ci100050t PubMed

Corsello SM, Bittker JA, Liu Z, Gould J, McCarren P, Hirschman JE, Johnston SE, Vrcic A, Wong B, Khan M, Asiedu J, Narayan R, Mader CC, Subramanian A, Golub TR (2017) The drug repurposing hub: a next-generation drug library and information resource. Nat Med 23(4):405–408. 10.1038/nm.4306 PubMed PMC

Falkner S, Klein A, Hutter F (2018) BOHB: Robust and Efficient Hyperparameter Optimization at Scale. In: Dy JG, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp 1437–1446. PLMR, Stockholmsmässan, Stockholm, Sweden

Fromer JC, Coley CW (2023) Computer-aided multi-objective optimization in small molecule discovery. Patterns 10.1016/j.patter.2023.100678 PubMed PMC

Šícho M, Luukkonen S, van Den Maagdenberg HW, Schoenmaker L, Béquignon OJM, Van Westen GJP (2023) DrugEx: deep learning models and tools for exploration of drug-like chemical space. J Chem Informat Model 63(12):3629–3636. 10.1021/acs.jcim.3c00434 PubMed PMC

Degen J, Wegscheid-Gerlach C, Zaliani A, Rarey M (2008) On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem 3(10):1503–1507. 10.1002/cmdc.200800178 PubMed

van den Maagdenberg H, Sicho M, Schoenmaker L, Bequignon OJM, Luukkonen S, Gorosiola González M, Araripe D (2023) QSPRPred: a tool for creating quantitative structure property relationship (QSPR) models. https://github.com/CDDLeiden/QSPRPred Accessed 06 June 2023

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...