Most cited article - PubMed ID 36282546
AHoJ: rapid, tailored search and retrieval of apo and holo protein structures for user-defined ligands
Knowledge of protein-ligand binding sites (LBSs) is crucial for advancing our understanding of biology and developing practical applications in fields such as medicine or biotechnology. PrankWeb is a web server that allows users to predict LBSs from a given three-dimensional structure. It provides access to P2Rank, a state-of-the-art machine learning tool for binding site prediction. Here, we present a new version of PrankWeb enabling the development of both client- and server-side modules acting as postprocessing tasks on the predicted pockets. Furthermore, each module can be associated with a visualization module that acts on the results provided by both client- and server-side modules. This newly developed system was utilized to implement the ability to dock user-provided molecules into the predicted pockets using AutoDock Vina (server-side module) and to interactively visualize the predicted poses (visualization module). In addition to introducing a modular architecture, we revamped PrankWeb's interface to better support the modules and enhance user interaction between the 1D and 3D viewers. We introduced a new, faster P2Rank backend or user-friendly exports, including ChimeraX visualization.
- MeSH
- Internet MeSH
- Protein Conformation MeSH
- Ligands MeSH
- Proteins * chemistry metabolism MeSH
- Molecular Docking Simulation MeSH
- Software * MeSH
- Machine Learning MeSH
- User-Computer Interface MeSH
- Protein Binding MeSH
- Binding Sites MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Ligands MeSH
- Proteins * MeSH
MOTIVATION: Structure-based methods for detecting protein-ligand binding sites play a crucial role in various domains, from fundamental research to biomedical applications. However, current prediction methodologies often rely on holo (ligand-bound) protein conformations for training and evaluation, overlooking the significance of the apo (ligand-free) states. This oversight is particularly problematic in the case of cryptic binding sites (CBSs) where holo-based assessment yields unrealistic performance expectations. RESULTS: To advance the development in this domain, we introduce CryptoBench, a benchmark dataset tailored for training and evaluating novel CBS prediction methodologies. CryptoBench is constructed upon a large collection of apo-holo protein pairs, grouped by UniProtID, clustered by sequence identity, and filtered to contain only structures with substantial structural change in the binding site. CryptoBench comprises 1107 structures with predefined cross-validation splits, making it the most extensive CBS dataset to date. To establish a performance baseline, we measured the predictive power of sequence- and structure-based CBS residue prediction methods using the benchmark. We selected PocketMiner as the state-of-the-art representative of the structure-based methods for CBS detection, and P2Rank, a widely-used structure-based method for general binding site prediction that is not specifically tailored for cryptic sites. For sequence-based approaches, we trained a neural network to classify binding residues using protein language model embeddings. Our sequence-based approach outperformed PocketMiner and P2Rank across key metrics, including area under the curve, area under the precision-recall curve, Matthew's correlation coefficient, and F1 scores. These results provide baseline benchmark results for future CBS and potentially also non-CBS prediction endeavors, leveraging CryptoBench as the foundational platform for further advancements in the field. AVAILABILITY AND IMPLEMENTATION: The CryptoBench dataset, including the benchmark model, is available on Open Science Framework-https://osf.io/pz4a9/. The code and tutorial are available at the GitHub repository-https://github.com/skrhakv/CryptoBench/.
- MeSH
- Benchmarking MeSH
- Databases, Protein MeSH
- Protein Conformation MeSH
- Ligands MeSH
- Proteins * chemistry metabolism MeSH
- Software * MeSH
- Protein Binding MeSH
- Binding Sites MeSH
- Computational Biology * methods MeSH
- Publication type
- Journal Article MeSH
- Names of Substances
- Ligands MeSH
- Proteins * MeSH