Recent advancements in deep learning and generative models have significantly expanded the applications of virtual screening for drug-like compounds. Here, we introduce a multitarget transformer model, PCMol, that leverages the latent protein embeddings derived from AlphaFold2 as a means of conditioning a de novo generative model on different targets. Incorporating rich protein representations allows the model to capture their structural relationships, enabling the chemical space interpolation of active compounds and target-side generalization to new proteins based on embedding similarities. In this work, we benchmark against other existing target-conditioned transformer models to illustrate the validity of using AlphaFold protein representations over raw amino acid sequences. We show that low-dimensional projections of these protein embeddings cluster appropriately based on target families and that model performance declines when these representations are intentionally corrupted. We also show that the PCMol model generates diverse, potentially active molecules for a wide array of proteins, including those with sparse ligand bioactivity data. The generated compounds display higher similarity known active ligands of held-out targets and have comparable molecular docking scores while maintaining novelty. Additionally, we demonstrate the important role of data augmentation in bolstering the performance of generative models in low-data regimes. Software package and AlphaFold protein embeddings are freely available at https://github.com/CDDLeiden/PCMol.
- MeSH
- Protein Conformation MeSH
- Ligands MeSH
- Models, Molecular * MeSH
- Proteins * chemistry metabolism MeSH
- Drug Design * MeSH
- Publication type
- Journal Article MeSH
Neurotropic pathogens, notably, herpesviruses, have been associated with significant neuropsychiatric effects. As a group, these pathogens can exploit molecular mimicry mechanisms to manipulate the host central nervous system to their advantage. Here, we present a systematic computational approach that may ultimately be used to unravel protein-protein interactions and molecular mimicry processes that have not yet been solved experimentally. Toward this end, we validate this approach by replicating a set of pre-existing experimental findings that document the structural and functional similarities shared by the human cytomegalovirus-encoded UL144 glycoprotein and human tumor necrosis factor receptor superfamily member 14 (TNFRSF14). We began with a thorough exploration of the Homo sapiens protein database using the Basic Local Alignment Search Tool (BLASTx) to identify proteins sharing sequence homology with UL144. Subsequently, we used AlphaFold2 to predict the independent three-dimensional structures of UL144 and TNFRSF14. This was followed by a comprehensive structural comparison facilitated by Distance-Matrix Alignment and Foldseek. Finally, we used AlphaFold-multimer and PPIscreenML to elucidate potential protein complexes and confirm the predicted binding activities of both UL144 and TNFRSF14. We then used our in silico approach to replicate the experimental finding that revealed TNFRSF14 binding to both B- and T-lymphocyte attenuator (BTLA) and glycoprotein domain and UL144 binding to BTLA alone. This computational framework offers promise in identifying structural similarities and interactions between pathogen-encoded proteins and their host counterparts. This information will provide valuable insights into the cognitive mechanisms underlying the neuropsychiatric effects of viral infections.
- MeSH
- Cognition physiology MeSH
- Humans MeSH
- Molecular Mimicry * MeSH
- Models, Molecular MeSH
- Amino Acid Sequence MeSH
- Protein Binding MeSH
- Viral Proteins metabolism chemistry MeSH
- Computational Biology methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
The AlphaFold2 prediction algorithm opened up the possibility of exploring proteins' structural space at an unprecedented scale. Currently, >200 million protein structures predicted by this approach are deposited in AlphaFoldDB, covering entire proteomes of multiple organisms, including humans. Predicted structures are, however, stored without detailed functional annotations describing their chemical behaviour. Partial atomic charges, which map electron distribution over a molecule and provide a clue to its chemical reactivity, are an important example of such data. We introduce the web application αCharges: a tool for the quick calculation of partial atomic charges for protein structures from AlphaFoldDB. The charges are calculated by the recent empirical method SQE+qp, parameterised for this class of molecules using robust quantum mechanics charges (B3LYP/6-31G*/NPA) on PROPKA3 protonated structures. The computed partial atomic charges can be downloaded in common data formats or visualised via the powerful Mol* viewer. The αCharges application is freely available at https://alphacharges.ncbr.muni.cz with no login requirement.
- MeSH
- Algorithms MeSH
- Protein Conformation MeSH
- Humans MeSH
- Proteins * chemistry MeSH
- Proteome MeSH
- Software * MeSH
- Computational Biology * instrumentation methods MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH