AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models

. 2024 Nov 11 ; 64 (21) : 8113-8122. [epub] 20241030

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid39475544

Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.

Zobrazit více v PubMed

Sabe V. T.; Ntombela T.; Jhamba L. A.; Maguire G. E.; Govender T.; Naicker T.; Kruger H. G. Current Trends in Computer Aided Drug Design and a Highlight of Drugs Discovered via Computational Techniques: A review. Eur. J. Med. Chem. 2021, 224, 113705.10.1016/j.ejmech.2021.113705. PubMed DOI

Imming P.; Sinning C.; Meyer A. Drugs, their Targets and the Nature and Number of Drug Targets. Nature Reviews Drug Discovery 2006, 5, 821–834. 10.1038/nrd2132. PubMed DOI

Rask-Andersen M.; Almén M. S.; Schiöth H. B. Trends in the Exploitation of Novel Drug Targets. Nature Reviews Drug Discovery 2011, 10, 579–590. 10.1038/nrd3478. PubMed DOI

Filipe H. A. L.; Loura L. M. S. Molecular Dynamics Simulations: Advances and Applications. Molecules 2022, 27, 2105.10.3390/molecules27072105. PubMed DOI PMC

Muhammad U.; Uzairu A.; Ebuka Arthur D. Review on: Quantitative Structure Activity Relationship (QSAR) Modeling. J Anal Pharm Res 2018, 7, 240–242. 10.15406/japlr.2018.07.00232. DOI

Shivanyuk A.; Ryabukhin S.; Bogolyubsky A.; Mykytenko D.; Chuprina A.; Heilman W.; Kostyuk A.; Tolmachev A. Enamine real database: Making chemical diversity real. Chimica Oggi 2007, 25, 58–59.

Irwin J. J.; Shoichet B. K. ZINC–A Free Database of Commercially Available Compounds for Virtual Screening. Journal of chemical information and modeling 2005, 45, 177–182. 10.1021/ci049714+. PubMed DOI PMC

Reymond J.-L. The Chemical Space Project. Acc. Chem. Res. 2015, 48, 722–730. 10.1021/ar500432k. PubMed DOI

Gaulton A.; Bellis L. J.; Bento A. P.; Chambers J.; Davies M.; Hersey A.; Light Y.; McGlinchey S.; Michalovich D.; Al-Lazikani B.; Overington J. P. ChEMBL: a Large-scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. 10.1093/nar/gkr777. PubMed DOI PMC

Béquignon O. J. M.; Bongers B. J.; Jespers W.; IJzerman A. P.; van der Water B.; van Westen G. J. P. Papyrus: a Large-scale Curated Dataset Aimed at Bioactivity Predictions. Journal of. Cheminformatics 2023, 15, 3.10.1186/s13321-022-00672-x. PubMed DOI PMC

Anstine D. M.; Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J. Am. Chem. Soc. 2023, 145, 8736–8750. 10.1021/jacs.2c13467. PubMed DOI PMC

Blaschke T.; Arús-Pous J.; Chen H.; Margreitter C.; Tyrchan C.; Engkvist O.; Papadopoulos K.; Patronov A. REINVENT 2.0: An AI Tool for De Novo Drug Design. Journal of Chemical Information and Modeling 2020, 60, 5918–5922. 10.1021/acs.jcim.0c00915. PubMed DOI

Sutton R. S.; Barto A. G.. Introduction to Reinforcement Learning. 1998

De Cao N.; Kipf T.. MolGAN: An Implicit Generative Model for Small Molecular Graphs. 2022; http://arxiv.org/abs/1805.11973, arXiv:1805.11973 [cs, stat].

Goodfellow I. J.; Pouget-Abadie J.; Mirza M.; Xu B.; Warde-Farley D.; Ozair S.; Courville A.; Bengio Y.. Generative Adversarial Networks. 2014; http://arxiv.org/abs/1406.2661, arXiv:1406.2661 [cs, stat].

Jin W.; Barzilay R.; Jaakkola T. Junction Tree Variational Autoencoder for Molecular Graph Generation. 2019; http://arxiv.org/abs/1802.04364, arXiv:1802.04364 [cs, stat].

Kingma D. P.; Welling M.. Auto-Encoding Variational Bayes. 2022; http://arxiv.org/abs/1312.6114, arXiv:1312.6114 [cs, stat].

Lipton Z. C.; Berkowitz J.; Elkan C.. A Critical Review of Recurrent Neural Networks for Sequence Learning. 2015; http://arxiv.org/abs/1506.00019, arXiv:1506.00019 [cs].

Šícho M.; Luukkonen S.; van den Maagdenberg H. W.; Schoenmaker L.; Béquignon O. J. M.; van Westen G. J. P. DrugEx: Deep Learning Models and Tools for Exploration of Drug-Like Chemical Space. Journal of Chemical Information and Modeling 2023, 63, 3629–3636. 10.1021/acs.jcim.3c00434. PubMed DOI PMC

Vaswani A.; Shazeer N.; Parmar N.; Uszkoreit J.; Jones L.; Gomez A. N.; Kaiser u.; Polosukhin I.. Attention is All you Need. Advances in Neural Information Processing Systems. 2017.

Ho J.; Jain A.; Abbeel P.. Denoising Diffusion Probabilistic Models. 2020; http://arxiv.org/abs/2006.11239, arXiv:2006.11239 [cs, stat].

Hoogeboom E.; Satorras V. G.; Vignac C.; Welling M.. Equivariant Diffusion for Molecule Generation in 3D. 2022; http://arxiv.org/abs/2203.17003, arXiv:2203.17003 [cs, q-bio, stat].

De P.; Kar S.; Ambure P.; Roy K. Prediction Reliability of QSAR Models: an Overview of Various Validation Tools. Arch. Toxicol. 2022, 96, 1279–1295. 10.1007/s00204-022-03252-y. PubMed DOI

van Westen G. J. P.; Wegner J. K.; IJzerman A. P.; van Vlijmen H. W. T.; Bender A. Proteochemometric Modeling as a Tool to Design Selective Compounds and for Extrapolating to Novel Targets. Med. Chem. Commun. 2011, 2, 16–30. 10.1039/C0MD00165A. DOI

D’Souza S.; Prema K. V.; Balaji S. Machine Learning Models for Drug–target Interactions: Current Knowledge and Future Directions. Drug Discovery Today 2020, 25, 748–756. 10.1016/j.drudis.2020.03.003. PubMed DOI

Kaplan J.; McCandlish S.; Henighan T.; Brown T. B.; Chess B.; Child R.; Gray S.; Radford A.; Wu J.; Amodei D.. Scaling Laws for Neural Language Models. 2020; http://arxiv.org/abs/2001.08361, arXiv:2001.08361 [cs, stat].

Tong X.; Liu X.; Tan X.; Li X.; Jiang J.; Xiong Z.; Xu T.; Jiang H.; Qiao N.; Zheng M. Generative Models for De Novo Drug Design. J. Med. Chem. 2021, 64, 14011–14027. 10.1021/acs.jmedchem.1c00927. PubMed DOI

Meyers J.; Fabian B.; Brown N. De Novo Molecular Design and Generative Models. Drug Discovery Today 2021, 26, 2707–2715. 10.1016/j.drudis.2021.05.019. PubMed DOI

Grechishnikova D. Transformer Neural Network for Protein-specific De Novo Drug Generation as a Machine Translation Problem. Scientific Reports 2021, 11, 321.10.1038/s41598-020-79682-4. PubMed DOI PMC

Ghanbarpour A.; Lill M. A.. Seq2Mol: Automatic Design of De Novo Molecules Conditioned by the Target Protein Sequences Through Deep Neural Networks. 2020; http://arxiv.org/abs/2010.15900, arXiv:2010.15900 [q-bio].

Qian H.; Lin C.; Zhao D.; Tu S.; Xu L. AlphaDrug: Protein Target Specific De Novo Molecular Generation. PNAS Nexus 2022, 1, pgac22710.1093/pnasnexus/pgac227. PubMed DOI PMC

Xu M.; Ran T.; Chen H. De Novo Molecule Design Through the Molecular Generative Model Conditioned by 3D Information of Protein Binding Sites. Journal of Chemical Information and Modeling 2021, 61, 3240–3254. 10.1021/acs.jcim.0c01494. PubMed DOI

Yuan Y.; Pei J.; Lai L. LigBuilder 2: A Practical de Novo Drug Design Approach. Journal of Chemical Information and Modeling 2011, 51, 1083–1091. 10.1021/ci100350u. PubMed DOI

Cieplinski T.; Danel T.; Podlewska S.; Jastrzebski S. Generative models should at least be able to design molecules that dock well: A new benchmark. Journal of Chemical Information and Modeling 2023, 63, 3238–3247. 10.1021/acs.jcim.2c01355. PubMed DOI PMC

Buttenschoen M.; Morris G. M.; Deane C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chemical Science 2024, 15, 3130–3139. 10.1039/D3SC04185A. PubMed DOI PMC

Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold | Nature. Nature 2021, 596, 583–589. 10.1038/s41586-021-03819-2. PubMed DOI PMC

Lin Z.; Akin H.; Rao R.; Hie B.; Zhu Z.; Lu W.; Smetanin N.; Verkuil R.; Kabeli O.; Shmueli Y.; dos Santos Costa A.; Fazel-Zarandi M.; Sercu T.; Candido S.; Rives A. Evolutionary-scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379, 1123–1130. 10.1126/science.ade2574. PubMed DOI

Baek M.; Anishchenko I.; Humphreys I.; Cong Q.; Baker D.; DiMaio F.. Efficient and Accurate Prediction of Protein Structure using RoseTTAFold2. bioRxiv; 2023;10.1101/2023.05.24.542179. DOI

Varadi M.; Velankar S. The impact of AlphaFold Protein Structure Database on the fields of life sciences. Proteomics 2023, 23, 2200128.10.1002/pmic.202200128. PubMed DOI

Patterson D.; Gonzalez J.; Le Q.; Liang C.; Munguia L.-M.; Rothchild D.; So D.; Texier M.; Dean J.. Carbon Emissions and Large Neural Network Training. 2021; http://arxiv.org/abs/2104.10350, arXiv:2104.10350 [cs].

Corso G.; Stärk H.; Jing B.; Barzilay R.; Jaakkola T.. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. 2022; http://arxiv.org/abs/2210.01776, arXiv:2210.01776 [physics, q-bio].

Yuan Q.; Chen S.; Rao J.; Zheng S.; Zhao H.; Yang Y. AlphaFold2-Aware Protein-DNA Binding Site Prediction Using Graph Transformer. Briefings in Bioinformatics 2022, 23, bbab56410.1093/bib/bbab564. PubMed DOI

Marquet C.; Heinzinger M.; Olenyi T.; Dallago C.; Erckert K.; Bernhofer M.; Nechaev D.; Rost B. Embeddings from Protein Language Models Predict Conservation and Variant Effects. Hum. Genet. 2022, 141, 1629–1647. 10.1007/s00439-021-02411-y. PubMed DOI PMC

Heinzinger M.; Littmann M.; Sillitoe I.; Bordin N.; Orengo C.; Rost B. Contrastive Learning on Protein Embeddings Enlightens Midnight Zone. NAR Genomics and Bioinformatics 2022, 4, lqac04310.1093/nargab/lqac043. PubMed DOI PMC

McInnes L.; Healy J.; Melville J.. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. 2020; http://arxiv.org/abs/1802.03426, arXiv:1802.03426 [cs, stat].

Thompson J. D.; Gibson T. J.; Higgins D. G. Multiple sequence alignment using ClustalW and ClustalX. Current protocols in bioinformatics 2003, 00, 2.3.1–2.3.22. 10.1002/0471250953.bi0203s00. PubMed DOI

Sun J.; Jeliazkova N.; Chupakhin V.; Golib-Dzib J.-F.; Engkvist O.; Carlsson L.; Wegner J.; Ceulemans H.; Georgiev I.; Jeliazkov V.; Kochev N.; Ashby T. J.; Chen H. ExCAPE-DB: an Integrated Large Scale Dataset Facilitating Big Data Analysis in Chemogenomics. Journal of Cheminformatics 2017, 9, 17.10.1186/s13321-017-0203-5. PubMed DOI PMC

Xiong R.; Yang Y.; He D.; Zheng K.; Zheng S.; Xing C.; Zhang H.; Lan Y.; Wang L.; Liu T.-Y.. On Layer Normalization in the Transformer Architecture. 2020; http://arxiv.org/abs/2002.04745, arXiv:2002.04745 [cs, stat].

Bjerrum E. J.SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules. 2017; http://arxiv.org/abs/1703.07076, arXiv:1703.07076 [cs].

Krenn M.; Häse F.; Nigam A.; Friederich P.; Aspuru-Guzik A. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Machine Learning: Science and Technology 2020, 1, 04502410.1088/2632-2153/aba947. DOI

Alberga D.; Lamanna G.; Graziano G.; Delre P.; Lomuscio M. C.; Corriero N.; Ligresti A.; Siliqi D.; Saviano M.; Contino M.; Stefanachi A.; Mangiatordi G. F. DeLA-DrugSelf: Empowering Multi-Objective De Novo Design Through SELFIES Molecular Representation. Computers in Biology and Medicine 2024, 175, 108486.10.1016/j.compbiomed.2024.108486. PubMed DOI

Méndez-Lucio O.; Baillif B.; Clevert D.-A.; Rouquié D.; Wichard J. De Novo Generation of Hit-Like Molecules From Gene Expression Signatures using Artificial Intelligence. Nature communications 2020, 11, 10.10.1038/s41467-019-13807-w. PubMed DOI PMC

Rogers D.; Hahn M. Extended-Connectivity Fingerprints. Journal of Chemical Information and Modeling 2010, 50, 742–754. 10.1021/ci100050t. PubMed DOI

Chung S.; Funakoshi T.; Civelli O. Orphan GPCR research. British journal of pharmacology 2008, 153, S339–S346. 10.1038/sj.bjp.0707606. PubMed DOI PMC

Gentile F.; Oprea T.; Tropsha A.; Cherkasov A. Surely you are joking, Mr Docking!. Chemical Society Reviews 2023, 52, 872–878. 10.1039/D2CS00948J. PubMed DOI

Prieto-Martínez F. D.; Arciniega M.; Medina-Franco J. L. Molecular Docking: Current Advances and Challenges. TIP. Revista especializada en ciencias qu í mico-biológicas 2018, 21, 65.10.22201/fesz.23958723e.2018.0.143. DOI

Ahmad S.; Singh V.; Gautam H. K.; Raza K. Multisampling-based Docking Reveals Imidazolidinyl Urea as a Multitargeted Inhibitor For Lung Cancer: an Optimisation Followed Multi-Simulation and In-Vitro Study. Journal of Biomolecular Structure and Dynamics 2024, 42, 2494–2511. 10.1080/07391102.2023.2209673. PubMed DOI

Potlitz F.; Link A.; Schulig L. Advances in the Discovery of New Chemotypes Through Ultra-Large Library Docking. Expert opinion on drug discovery 2023, 18, 303–313. 10.1080/17460441.2023.2171984. PubMed DOI

Cavasotto C. N.; Di Filippo J. I. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. Journal of Chemical Information and Modeling 2023, 63, 2267–2280. 10.1021/acs.jcim.2c01471. PubMed DOI

Tang S.; Chen R.; Lin M.; Lin Q.; Zhu Y.; Ding J.; Hu H.; Ling M.; Wu J. Accelerating AutoDock Vina with GPUs. Molecules 2022, 27, 3041.10.3390/molecules27093041. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...