-
Something wrong with this record ?
AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models
A. Bernatavicius, M. Šícho, APA. Janssen, AK. Hassen, M. Preuss, GJP. van Westen
Language English Country United States
Document type Journal Article
- MeSH
- Protein Conformation MeSH
- Ligands MeSH
- Models, Molecular * MeSH
- Proteins * chemistry metabolism MeSH
- Drug Design * MeSH
- Publication type
- Journal Article MeSH
Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc25003482
- 003
- CZ-PrNML
- 005
- 20250206104353.0
- 007
- ta
- 008
- 250121s2024 xxu f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1021/acs.jcim.4c00309 $2 doi
- 035 __
- $a (PubMed)39475544
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxu
- 100 1_
- $a Bernatavicius, Andrius $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $u Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
- 245 10
- $a AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models / $c A. Bernatavicius, M. Šícho, APA. Janssen, AK. Hassen, M. Preuss, GJP. van Westen
- 520 9_
- $a Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
- 650 12
- $a proteiny $x chemie $x metabolismus $7 D011506
- 650 12
- $a racionální návrh léčiv $7 D015195
- 650 12
- $a molekulární modely $7 D008958
- 650 _2
- $a konformace proteinů $7 D011487
- 650 _2
- $a ligandy $7 D008024
- 655 _2
- $a časopisecké články $7 D016428
- 700 1_
- $a Šícho, Martin $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $u CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
- 700 1_
- $a Janssen, Antonius P A $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $u Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $1 https://orcid.org/000000034203261X
- 700 1_
- $a Hassen, Alan Kai $u Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
- 700 1_
- $a Preuss, Mike $u Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
- 700 1_
- $a van Westen, Gerard J P $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $1 https://orcid.org/0000000307171817
- 773 0_
- $w MED00008945 $t Journal of chemical information and modeling $x 1549-960X $g Roč. 64, č. 21 (2024), s. 8113-8122
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/39475544 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y - $z 0
- 990 __
- $a 20250121 $b ABA008
- 991 __
- $a 20250206104348 $b ABA008
- 999 __
- $a ok $b bmc $g 2263320 $s 1239489
- BAS __
- $a 3
- BAS __
- $a PreBMC-MEDLINE
- BMC __
- $a 2024 $b 64 $c 21 $d 8113-8122 $e 20241030 $i 1549-960X $m Journal of chemical information and modeling $n J Chem Inf Model $x MED00008945
- LZP __
- $a Pubmed-20250121