• Something wrong with this record ?

AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models

A. Bernatavicius, M. Šícho, APA. Janssen, AK. Hassen, M. Preuss, GJP. van Westen

. 2024 ; 64 (21) : 8113-8122. [pub] 20241030

Language English Country United States

Document type Journal Article

Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc25003482
003      
CZ-PrNML
005      
20250206104353.0
007      
ta
008      
250121s2024 xxu f 000 0|eng||
009      
AR
024    7_
$a 10.1021/acs.jcim.4c00309 $2 doi
035    __
$a (PubMed)39475544
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxu
100    1_
$a Bernatavicius, Andrius $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $u Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
245    10
$a AlphaFold Meets De Novo Drug Design: Leveraging Structural Protein Information in Multitarget Molecular Generative Models / $c A. Bernatavicius, M. Šícho, APA. Janssen, AK. Hassen, M. Preuss, GJP. van Westen
520    9_
$a Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
650    12
$a proteiny $x chemie $x metabolismus $7 D011506
650    12
$a racionální návrh léčiv $7 D015195
650    12
$a molekulární modely $7 D008958
650    _2
$a konformace proteinů $7 D011487
650    _2
$a ligandy $7 D008024
655    _2
$a časopisecké články $7 D016428
700    1_
$a Šícho, Martin $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $u CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28 Prague, Czech Republic
700    1_
$a Janssen, Antonius P A $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $u Leiden Institute of Chemistry, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $1 https://orcid.org/000000034203261X
700    1_
$a Hassen, Alan Kai $u Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
700    1_
$a Preuss, Mike $u Leiden Institute of Advanced Computer Science, Leiden University, Niels Bohrweg 1, 2333CA Leiden, The Netherlands
700    1_
$a van Westen, Gerard J P $u Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333CC Leiden, The Netherlands $1 https://orcid.org/0000000307171817
773    0_
$w MED00008945 $t Journal of chemical information and modeling $x 1549-960X $g Roč. 64, č. 21 (2024), s. 8113-8122
856    41
$u https://pubmed.ncbi.nlm.nih.gov/39475544 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20250121 $b ABA008
991    __
$a 20250206104348 $b ABA008
999    __
$a ok $b bmc $g 2263320 $s 1239489
BAS    __
$a 3
BAS    __
$a PreBMC-MEDLINE
BMC    __
$a 2024 $b 64 $c 21 $d 8113-8122 $e 20241030 $i 1549-960X $m Journal of chemical information and modeling $n J Chem Inf Model $x MED00008945
LZP    __
$a Pubmed-20250121

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...