AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models
Jazyk angličtina Země Spojené státy americké Médium print-electronic
Typ dokumentu časopisecké články
PubMed
39475544
PubMed Central
PMC11558674
DOI
10.1021/acs.jcim.4c00309
Knihovny.cz E-zdroje
- MeSH
- konformace proteinů MeSH
- ligandy MeSH
- molekulární modely * MeSH
- proteiny * chemie metabolismus MeSH
- racionální návrh léčiv * MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- ligandy MeSH
- proteiny * MeSH
Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
Zobrazit více v PubMed
Sabe V. T.; Ntombela T.; Jhamba L. A.; Maguire G. E.; Govender T.; Naicker T.; Kruger H. G. Current Trends in Computer Aided Drug Design and a Highlight of Drugs Discovered via Computational Techniques: A review. Eur. J. Med. Chem. 2021, 224, 113705.10.1016/j.ejmech.2021.113705. PubMed DOI
Imming P.; Sinning C.; Meyer A. Drugs, their Targets and the Nature and Number of Drug Targets. Nature Reviews Drug Discovery 2006, 5, 821–834. 10.1038/nrd2132. PubMed DOI
Rask-Andersen M.; Almén M. S.; Schiöth H. B. Trends in the Exploitation of Novel Drug Targets. Nature Reviews Drug Discovery 2011, 10, 579–590. 10.1038/nrd3478. PubMed DOI
Filipe H. A. L.; Loura L. M. S. Molecular Dynamics Simulations: Advances and Applications. Molecules 2022, 27, 2105.10.3390/molecules27072105. PubMed DOI PMC
Muhammad U.; Uzairu A.; Ebuka Arthur D. Review on: Quantitative Structure Activity Relationship (QSAR) Modeling. J Anal Pharm Res 2018, 7, 240–242. 10.15406/japlr.2018.07.00232. DOI
Shivanyuk A.; Ryabukhin S.; Bogolyubsky A.; Mykytenko D.; Chuprina A.; Heilman W.; Kostyuk A.; Tolmachev A. Enamine real database: Making chemical diversity real. Chimica Oggi 2007, 25, 58–59.
Irwin J. J.; Shoichet B. K. ZINC–A Free Database of Commercially Available Compounds for Virtual Screening. Journal of chemical information and modeling 2005, 45, 177–182. 10.1021/ci049714+. PubMed DOI PMC
Reymond J.-L. The Chemical Space Project. Acc. Chem. Res. 2015, 48, 722–730. 10.1021/ar500432k. PubMed DOI
Gaulton A.; Bellis L. J.; Bento A. P.; Chambers J.; Davies M.; Hersey A.; Light Y.; McGlinchey S.; Michalovich D.; Al-Lazikani B.; Overington J. P. ChEMBL: a Large-scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. 10.1093/nar/gkr777. PubMed DOI PMC
Béquignon O. J. M.; Bongers B. J.; Jespers W.; IJzerman A. P.; van der Water B.; van Westen G. J. P. Papyrus: a Large-scale Curated Dataset Aimed at Bioactivity Predictions. Journal of. Cheminformatics 2023, 15, 3.10.1186/s13321-022-00672-x. PubMed DOI PMC
Anstine D. M.; Isayev O. Generative Models as an Emerging Paradigm in the Chemical Sciences. J. Am. Chem. Soc. 2023, 145, 8736–8750. 10.1021/jacs.2c13467. PubMed DOI PMC
Blaschke T.; Arús-Pous J.; Chen H.; Margreitter C.; Tyrchan C.; Engkvist O.; Papadopoulos K.; Patronov A. REINVENT 2.0: An AI Tool for De Novo Drug Design. Journal of Chemical Information and Modeling 2020, 60, 5918–5922. 10.1021/acs.jcim.0c00915. PubMed DOI
Sutton R. S.; Barto A. G.. Introduction to Reinforcement Learning. 1998
De Cao N.; Kipf T.. MolGAN: An Implicit Generative Model for Small Molecular Graphs. 2022; http://arxiv.org/abs/1805.11973, arXiv:1805.11973 [cs, stat].
Goodfellow I. J.; Pouget-Abadie J.; Mirza M.; Xu B.; Warde-Farley D.; Ozair S.; Courville A.; Bengio Y.. Generative Adversarial Networks. 2014; http://arxiv.org/abs/1406.2661, arXiv:1406.2661 [cs, stat].
Jin W.; Barzilay R.; Jaakkola T. Junction Tree Variational Autoencoder for Molecular Graph Generation. 2019; http://arxiv.org/abs/1802.04364, arXiv:1802.04364 [cs, stat].
Kingma D. P.; Welling M.. Auto-Encoding Variational Bayes. 2022; http://arxiv.org/abs/1312.6114, arXiv:1312.6114 [cs, stat].
Lipton Z. C.; Berkowitz J.; Elkan C.. A Critical Review of Recurrent Neural Networks for Sequence Learning. 2015; http://arxiv.org/abs/1506.00019, arXiv:1506.00019 [cs].
Šícho M.; Luukkonen S.; van den Maagdenberg H. W.; Schoenmaker L.; Béquignon O. J. M.; van Westen G. J. P. DrugEx: Deep Learning Models and Tools for Exploration of Drug-Like Chemical Space. Journal of Chemical Information and Modeling 2023, 63, 3629–3636. 10.1021/acs.jcim.3c00434. PubMed DOI PMC
Vaswani A.; Shazeer N.; Parmar N.; Uszkoreit J.; Jones L.; Gomez A. N.; Kaiser u.; Polosukhin I.. Attention is All you Need. Advances in Neural Information Processing Systems. 2017.
Ho J.; Jain A.; Abbeel P.. Denoising Diffusion Probabilistic Models. 2020; http://arxiv.org/abs/2006.11239, arXiv:2006.11239 [cs, stat].
Hoogeboom E.; Satorras V. G.; Vignac C.; Welling M.. Equivariant Diffusion for Molecule Generation in 3D. 2022; http://arxiv.org/abs/2203.17003, arXiv:2203.17003 [cs, q-bio, stat].
De P.; Kar S.; Ambure P.; Roy K. Prediction Reliability of QSAR Models: an Overview of Various Validation Tools. Arch. Toxicol. 2022, 96, 1279–1295. 10.1007/s00204-022-03252-y. PubMed DOI
van Westen G. J. P.; Wegner J. K.; IJzerman A. P.; van Vlijmen H. W. T.; Bender A. Proteochemometric Modeling as a Tool to Design Selective Compounds and for Extrapolating to Novel Targets. Med. Chem. Commun. 2011, 2, 16–30. 10.1039/C0MD00165A. DOI
D’Souza S.; Prema K. V.; Balaji S. Machine Learning Models for Drug–target Interactions: Current Knowledge and Future Directions. Drug Discovery Today 2020, 25, 748–756. 10.1016/j.drudis.2020.03.003. PubMed DOI
Kaplan J.; McCandlish S.; Henighan T.; Brown T. B.; Chess B.; Child R.; Gray S.; Radford A.; Wu J.; Amodei D.. Scaling Laws for Neural Language Models. 2020; http://arxiv.org/abs/2001.08361, arXiv:2001.08361 [cs, stat].
Tong X.; Liu X.; Tan X.; Li X.; Jiang J.; Xiong Z.; Xu T.; Jiang H.; Qiao N.; Zheng M. Generative Models for De Novo Drug Design. J. Med. Chem. 2021, 64, 14011–14027. 10.1021/acs.jmedchem.1c00927. PubMed DOI
Meyers J.; Fabian B.; Brown N. De Novo Molecular Design and Generative Models. Drug Discovery Today 2021, 26, 2707–2715. 10.1016/j.drudis.2021.05.019. PubMed DOI
Grechishnikova D. Transformer Neural Network for Protein-specific De Novo Drug Generation as a Machine Translation Problem. Scientific Reports 2021, 11, 321.10.1038/s41598-020-79682-4. PubMed DOI PMC
Ghanbarpour A.; Lill M. A.. Seq2Mol: Automatic Design of De Novo Molecules Conditioned by the Target Protein Sequences Through Deep Neural Networks. 2020; http://arxiv.org/abs/2010.15900, arXiv:2010.15900 [q-bio].
Qian H.; Lin C.; Zhao D.; Tu S.; Xu L. AlphaDrug: Protein Target Specific De Novo Molecular Generation. PNAS Nexus 2022, 1, pgac22710.1093/pnasnexus/pgac227. PubMed DOI PMC
Xu M.; Ran T.; Chen H. De Novo Molecule Design Through the Molecular Generative Model Conditioned by 3D Information of Protein Binding Sites. Journal of Chemical Information and Modeling 2021, 61, 3240–3254. 10.1021/acs.jcim.0c01494. PubMed DOI
Yuan Y.; Pei J.; Lai L. LigBuilder 2: A Practical de Novo Drug Design Approach. Journal of Chemical Information and Modeling 2011, 51, 1083–1091. 10.1021/ci100350u. PubMed DOI
Cieplinski T.; Danel T.; Podlewska S.; Jastrzebski S. Generative models should at least be able to design molecules that dock well: A new benchmark. Journal of Chemical Information and Modeling 2023, 63, 3238–3247. 10.1021/acs.jcim.2c01355. PubMed DOI PMC
Buttenschoen M.; Morris G. M.; Deane C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chemical Science 2024, 15, 3130–3139. 10.1039/D3SC04185A. PubMed DOI PMC
Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; et al. Highly Accurate Protein Structure Prediction with AlphaFold | Nature. Nature 2021, 596, 583–589. 10.1038/s41586-021-03819-2. PubMed DOI PMC
Lin Z.; Akin H.; Rao R.; Hie B.; Zhu Z.; Lu W.; Smetanin N.; Verkuil R.; Kabeli O.; Shmueli Y.; dos Santos Costa A.; Fazel-Zarandi M.; Sercu T.; Candido S.; Rives A. Evolutionary-scale Prediction of Atomic-Level Protein Structure with a Language Model. Science 2023, 379, 1123–1130. 10.1126/science.ade2574. PubMed DOI
Baek M.; Anishchenko I.; Humphreys I.; Cong Q.; Baker D.; DiMaio F.. Efficient and Accurate Prediction of Protein Structure using RoseTTAFold2. bioRxiv; 2023;10.1101/2023.05.24.542179. DOI
Varadi M.; Velankar S. The impact of AlphaFold Protein Structure Database on the fields of life sciences. Proteomics 2023, 23, 2200128.10.1002/pmic.202200128. PubMed DOI
Patterson D.; Gonzalez J.; Le Q.; Liang C.; Munguia L.-M.; Rothchild D.; So D.; Texier M.; Dean J.. Carbon Emissions and Large Neural Network Training. 2021; http://arxiv.org/abs/2104.10350, arXiv:2104.10350 [cs].
Corso G.; Stärk H.; Jing B.; Barzilay R.; Jaakkola T.. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. 2022; http://arxiv.org/abs/2210.01776, arXiv:2210.01776 [physics, q-bio].
Yuan Q.; Chen S.; Rao J.; Zheng S.; Zhao H.; Yang Y. AlphaFold2-Aware Protein-DNA Binding Site Prediction Using Graph Transformer. Briefings in Bioinformatics 2022, 23, bbab56410.1093/bib/bbab564. PubMed DOI
Marquet C.; Heinzinger M.; Olenyi T.; Dallago C.; Erckert K.; Bernhofer M.; Nechaev D.; Rost B. Embeddings from Protein Language Models Predict Conservation and Variant Effects. Hum. Genet. 2022, 141, 1629–1647. 10.1007/s00439-021-02411-y. PubMed DOI PMC
Heinzinger M.; Littmann M.; Sillitoe I.; Bordin N.; Orengo C.; Rost B. Contrastive Learning on Protein Embeddings Enlightens Midnight Zone. NAR Genomics and Bioinformatics 2022, 4, lqac04310.1093/nargab/lqac043. PubMed DOI PMC
McInnes L.; Healy J.; Melville J.. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. 2020; http://arxiv.org/abs/1802.03426, arXiv:1802.03426 [cs, stat].
Thompson J. D.; Gibson T. J.; Higgins D. G. Multiple sequence alignment using ClustalW and ClustalX. Current protocols in bioinformatics 2003, 00, 2.3.1–2.3.22. 10.1002/0471250953.bi0203s00. PubMed DOI
Sun J.; Jeliazkova N.; Chupakhin V.; Golib-Dzib J.-F.; Engkvist O.; Carlsson L.; Wegner J.; Ceulemans H.; Georgiev I.; Jeliazkov V.; Kochev N.; Ashby T. J.; Chen H. ExCAPE-DB: an Integrated Large Scale Dataset Facilitating Big Data Analysis in Chemogenomics. Journal of Cheminformatics 2017, 9, 17.10.1186/s13321-017-0203-5. PubMed DOI PMC
Xiong R.; Yang Y.; He D.; Zheng K.; Zheng S.; Xing C.; Zhang H.; Lan Y.; Wang L.; Liu T.-Y.. On Layer Normalization in the Transformer Architecture. 2020; http://arxiv.org/abs/2002.04745, arXiv:2002.04745 [cs, stat].
Bjerrum E. J.SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules. 2017; http://arxiv.org/abs/1703.07076, arXiv:1703.07076 [cs].
Krenn M.; Häse F.; Nigam A.; Friederich P.; Aspuru-Guzik A. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Machine Learning: Science and Technology 2020, 1, 04502410.1088/2632-2153/aba947. DOI
Alberga D.; Lamanna G.; Graziano G.; Delre P.; Lomuscio M. C.; Corriero N.; Ligresti A.; Siliqi D.; Saviano M.; Contino M.; Stefanachi A.; Mangiatordi G. F. DeLA-DrugSelf: Empowering Multi-Objective De Novo Design Through SELFIES Molecular Representation. Computers in Biology and Medicine 2024, 175, 108486.10.1016/j.compbiomed.2024.108486. PubMed DOI
Méndez-Lucio O.; Baillif B.; Clevert D.-A.; Rouquié D.; Wichard J. De Novo Generation of Hit-Like Molecules From Gene Expression Signatures using Artificial Intelligence. Nature communications 2020, 11, 10.10.1038/s41467-019-13807-w. PubMed DOI PMC
Rogers D.; Hahn M. Extended-Connectivity Fingerprints. Journal of Chemical Information and Modeling 2010, 50, 742–754. 10.1021/ci100050t. PubMed DOI
Chung S.; Funakoshi T.; Civelli O. Orphan GPCR research. British journal of pharmacology 2008, 153, S339–S346. 10.1038/sj.bjp.0707606. PubMed DOI PMC
Gentile F.; Oprea T.; Tropsha A.; Cherkasov A. Surely you are joking, Mr Docking!. Chemical Society Reviews 2023, 52, 872–878. 10.1039/D2CS00948J. PubMed DOI
Prieto-Martínez F. D.; Arciniega M.; Medina-Franco J. L. Molecular Docking: Current Advances and Challenges. TIP. Revista especializada en ciencias qu í mico-biológicas 2018, 21, 65.10.22201/fesz.23958723e.2018.0.143. DOI
Ahmad S.; Singh V.; Gautam H. K.; Raza K. Multisampling-based Docking Reveals Imidazolidinyl Urea as a Multitargeted Inhibitor For Lung Cancer: an Optimisation Followed Multi-Simulation and In-Vitro Study. Journal of Biomolecular Structure and Dynamics 2024, 42, 2494–2511. 10.1080/07391102.2023.2209673. PubMed DOI
Potlitz F.; Link A.; Schulig L. Advances in the Discovery of New Chemotypes Through Ultra-Large Library Docking. Expert opinion on drug discovery 2023, 18, 303–313. 10.1080/17460441.2023.2171984. PubMed DOI
Cavasotto C. N.; Di Filippo J. I. The Impact of Supervised Learning Methods in Ultralarge High-Throughput Docking. Journal of Chemical Information and Modeling 2023, 63, 2267–2280. 10.1021/acs.jcim.2c01471. PubMed DOI
Tang S.; Chen R.; Lin M.; Lin Q.; Zhu Y.; Ding J.; Hu H.; Ling M.; Wu J. Accelerating AutoDock Vina with GPUs. Molecules 2022, 27, 3041.10.3390/molecules27093041. PubMed DOI PMC