CryptoBench: cryptic protein-ligand binding sites dataset and benchmark

. 2024 Dec 26 ; 41 (1) : .

Jazyk angličtina Země Velká Británie, Anglie Médium print

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid39693053

Grantová podpora
23-07349S Czech Science Foundation

MOTIVATION: Structure-based methods for detecting protein-ligand binding sites play a crucial role in various domains, from fundamental research to biomedical applications. However, current prediction methodologies often rely on holo (ligand-bound) protein conformations for training and evaluation, overlooking the significance of the apo (ligand-free) states. This oversight is particularly problematic in the case of cryptic binding sites (CBSs) where holo-based assessment yields unrealistic performance expectations. RESULTS: To advance the development in this domain, we introduce CryptoBench, a benchmark dataset tailored for training and evaluating novel CBS prediction methodologies. CryptoBench is constructed upon a large collection of apo-holo protein pairs, grouped by UniProtID, clustered by sequence identity, and filtered to contain only structures with substantial structural change in the binding site. CryptoBench comprises 1107 structures with predefined cross-validation splits, making it the most extensive CBS dataset to date. To establish a performance baseline, we measured the predictive power of sequence- and structure-based CBS residue prediction methods using the benchmark. We selected PocketMiner as the state-of-the-art representative of the structure-based methods for CBS detection, and P2Rank, a widely-used structure-based method for general binding site prediction that is not specifically tailored for cryptic sites. For sequence-based approaches, we trained a neural network to classify binding residues using protein language model embeddings. Our sequence-based approach outperformed PocketMiner and P2Rank across key metrics, including area under the curve, area under the precision-recall curve, Matthew's correlation coefficient, and F1 scores. These results provide baseline benchmark results for future CBS and potentially also non-CBS prediction endeavors, leveraging CryptoBench as the foundational platform for further advancements in the field. AVAILABILITY AND IMPLEMENTATION: The CryptoBench dataset, including the benchmark model, is available on Open Science Framework-https://osf.io/pz4a9/. The code and tutorial are available at the GitHub repository-https://github.com/skrhakv/CryptoBench/.

Zobrazit více v PubMed

Abramson J, Adler J, Dunger J. et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature 2024;630:493–500. 10.1038/s41586-024-07487-w PubMed DOI PMC

AlQuraishi M. ProteinNet: A standardized data set for machine learning of protein structure. BMC Bioinformatics 2019;20:311. 10.1186/s12859-019-2932-0 PubMed DOI PMC

Beglov D, Hall DR, Wakefield AE. et al. Exploring the structural origins of cryptic sites on proteins. Proc Natl Acad Sci U S A 2018;115:E3416–25. 10.1073/pnas.1711490115 PubMed DOI PMC

Cimermancic P, Weinkam P, Rettenmaier TJ. et al. Cryptosite: expanding the druggable proteome by characterization and prediction of cryptic binding sites. J Mol Biol 2016;428:709–19. 10.1016/j.jmb.2016.01.029 PubMed DOI PMC

Egbert M, Jones G, Collins MR. et al. Ftmove: a web server for detection and analysis of cryptic and allosteric binding sites by mapping multiple protein structures. J Mol Biol 2022;434:167587. PubMed PMC

Ehrt C. Protein binding site comparison. PhD Thesis. Technische Universität Dortmund, 2019.

Feidakis CP, Krivak R, Hoksza D. et al. AHoJ-DB: A PDB-wide assignment of apo & holo relationships based on individual protein-ligand interactions. J Mol Biol 2024;436:168545. 10.1016/j.jmb.2024.168545 PubMed DOI

Feidakis CP, Krivak R, Hoksza D. et al. Ahoj: rapid, tailored search and retrieval of apo and holo protein structures for user-defined ligands. Bioinformatics 2022;38:5452–3. PubMed PMC

Jakubec D, Vondrášek J, Finn RD.. 3DPatch: fast 3D structure visualization with residue conservation. Bioinformatics 2019;35:332–4. 10.1093/bioinformatics/bty464 PubMed DOI PMC

Krivák R, Hoksza D.. P2rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform 2018;10:39. PubMed PMC

Kuzmanic A, Bowman GR, Juarez-Jimenez J. et al. Investigating cryptic binding sites by molecular dynamics simulations. Acc Chem Res 2020;53:654–61. 10.1021/acs.accounts.9b00613 PubMed DOI PMC

Lee B, Richards F.. The interpretation of protein structures: estimation of static accessibility. J Mol Biol 1971;55:379–400. 10.1016/0022-2836(71)90324-X PubMed DOI

Lin Z, Akin H, Rao R. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023;379:1123–30. 10.1126/science.ade2574 PubMed DOI

Lin Z, Akin H, Rao R. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 10.1101/2022.07.20.500902, 2022, preprint: not peer reviewed. DOI

Martinez-Rosell G, Lovera S, Sands ZA. et al. Playmolecule crypticscout: predicting protein cryptic sites using mixed-solvent molecular simulations. J Chem Inf Model 2020;60:2314–24. 10.1021/acs.jcim.9b01209 PubMed DOI

Meller A, Ward M, Borowsky J. et al. Predicting locations of cryptic pockets from single protein structures using the pocketminer graph neural network. Nat Commun 2023;14:1177. 10.1038/s41467-023-36699-3 PubMed DOI PMC

Richards FM. Areas, volumes, packing, and protein structure. Annu Rev Biophys Bioeng 1977;6:151–76. 10.1146/annurev.bb.06.060177.001055 PMID: 326146. PubMed DOI

Singh J, Petter R, Baillie T. et al. The resurgence of covalent drugs. Nat Rev Drug Discov 2011;10:307–17. 10.1038/nrd3410 PubMed DOI

Škoda P, Hoksza D. Benchmarking platform for ligand-based virtual screening. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), p.1220–7, Shenzhen, China: IEEE, December 2016. 10.1109/BIBM.2016.7822693 DOI

Škrhák V, Riedlova K, Novotny M. et al. Cryptic binding site prediction with protein language models. In: 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), p.2883–8, Los Alamitos, CA: IEEE Computer Society, December 2023. 10.1109/BIBM58861.2023.10385497 DOI

Smith RD, Carlson HA.. Identification of cryptic binding sites using mixmd with standard and accelerated molecular dynamics. J Chem Inf Model 2021;61:1287–99. 10.1021/acs.jcim.0c01002 PubMed DOI PMC

Smith RHB, Dar AC, Schlessinger A. Pyvol: a pymol plugin for visualization, comparison, and volume calculation of drug-binding sites. bioRxiv, 10.1101/816702, 2019, preprint: not peer reviewed. DOI

Steinegger M, Söding J.. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 2017;35:1026–8. 10.1038/nbt.3988 PubMed DOI

Thomas PD, Ebert D, Muruganujan A. et al. Panther: making genome-scale phylogenetics accessible to all. Protein Sci 2022;31:8–22. 10.1002/pro.4218 PubMed DOI PMC

Vajda S, Beglov D, Wakefield AE. et al. Cryptic binding sites on proteins: definition, detection, and druggability. Curr Opin Chem Biol 2018;44:1–8. 10.1016/j.cbpa.2018.05.003 PubMed DOI PMC

Varadi M, Berrisford J, Deshpande M. et al. Pdbe-kb: a community-driven resource for structural and functional annotations. Nucleic Acids Res 2020;48:D344–53. PubMed PMC

Wakefield AE, Kozakov D, Vajda S.. Mapping the binding sites of challenging drug targets. Curr Opin Struct Biol 2022;75:102396. PubMed PMC

wwPDB Consortium. Protein data bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 2019;47:D520–8. 10.1093/nar/gky949 PubMed DOI PMC

Xu J, Zhang Y.. How significant is a protein structure similarity with tm-score = 0.5? Bioinformatics 2010;26:889–95. 10.1093/bioinformatics/btq066 PubMed DOI PMC

Zhang C, Zhang X, Freddolino P. et al. BioLiP2: an updated structure database for biologically relevant ligand–protein interactions. Nucleic Acids Res 2024;52:D404–12. 10.1093/nar/gkad630 PubMed DOI PMC

Zhao J, Cao Y, Zhang L.. Exploring the computational methods for protein-ligand binding site prediction. Comput Struct Biotechnol J 2020;18:417–26. PubMed PMC

Zheng W. Predicting cryptic ligand binding sites based on normal modes guided conformational sampling. Proteins 2021;89:416–26. PubMed

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...