JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Článek

FT
PubMed

Záznam pochází z PubMed

Leveraging large language models for literature-driven prioritization of protein binding pockets

Stratiichuk, Roman
Autor Stratiichuk, Roman ORCID Receptor.AI Inc., London N1 7GU, United Kingdom Department of Biophysics and Medical Informatics, Educational and Scientific Centre "Іnstitute of Biology and Medicine", Taras Shevchenko Kyiv National University, Kyiv 01601, Ukraine
Melnychenko, Mykola
Autor Melnychenko, Mykola ORCID Receptor.AI Inc., London N1 7GU, United Kingdom
Koleiev, Ihor
Autor Koleiev, Ihor ORCID Receptor.AI Inc., London N1 7GU, United Kingdom Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine, Kyiv 03038, Ukraine
Voitsitskyi, Taras
Autor Voitsitskyi, Taras ORCID Receptor.AI Inc., London N1 7GU, United Kingdom Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine, Kyiv 03038, Ukraine
Husak, Vladyslav
Autor Husak, Vladyslav ORCID Receptor.AI Inc., London N1 7GU, United Kingdom Department of Cellular, Computational and Integrative Biology, The University of Trento, Povo, Trento 38123, Italy
Shevchuk, Nazar
Autor Shevchuk, Nazar ORCID Receptor.AI Inc., London N1 7GU, United Kingdom
Ostrovsky, Zakhar
Autor Ostrovsky, Zakhar ORCID Receptor.AI Inc., London N1 7GU, United Kingdom
Bdzhola, Volodymyr
Autor Bdzhola, Volodymyr ORCID Institute of Molecular Biology and Genetics of The National Academy of Sciences of Ukraine, Kyiv 03143, Ukraine
Yesylevskyy, Semen
Autor Yesylevskyy, Semen ORCID Receptor.AI Inc., London N1 7GU, United Kingdom Department of Physics of Biological Systems, Institute of Physics of The National Academy of Sciences of Ukraine, Kyiv 03038, Ukraine Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 6 CZ-166 10, Czech Republic Department of Physical Chemistry, Faculty of Science, Palacký University Olomouc, Olomouc 771 46, Czech Republic
Starosyla, Serhii
Autor Starosyla, Serhii ORCID Receptor.AI Inc., London N1 7GU, United Kingdom
Nafiiev, Alan
Autor Nafiiev, Alan ORCID Receptor.AI Inc., London N1 7GU, United Kingdom

Bioinformatics (Oxford, England). 2025 Aug 02 ; 41 (8) : .

Bioinformatics
ISSN 1367-4811 | 1367-4803
Zdroj

Jazyk angličtina Země Velká Británie, Anglie Médium print

Typ dokumentu časopisecké články

Perzistentní odkaz https://www.medvik.cz/link/pmid40795239

Grantová podpora
101101923 Ministry of Education

Online Plný text

PubMed 40795239
PubMed Central PMC12371332
DOI 10.1093/bioinformatics/btaf449
PII: 8225722
Knihovny.cz E-zdroje

MOTIVATION: Accurately identifying and prioritizing protein binding pockets is a foundational element of small-molecule drug discovery. Defining these known pockets currently relies on a laborious manual process of extracting key residue data from selected publications, reconciling inconsistent terminology, and independently computing volumetric representations. This manual curation to ensure biological relevance is time-consuming, error-prone, and represents a major bottleneck for efficient, high-throughput drug discovery. RESULTS: We present a novel approach for the identification and prioritization of protein binding pockets for small molecules by combining geometric pocket detection with large language models (LLMs). Our method leverages Fpocket to generate candidate pockets, which are then validated against published experimental data extracted from research articles using LLM with a series of prompts fine-tuned to identify and extract residue-level information associated with experimentally confirmed binding sites. We developed a curated benchmark dataset of diverse proteins and associated literature to train and evaluate the LLM's performance in paper relevance assessment and pocket extraction. AVAILABILITY AND IMPLEMENTATION: The developed benchmark dataset and methodology are freely available at the GitHub repository (https://github.com/receptor-ai/LLM-benchmark-dataset) and Zenodo (DOI: 10.5281/zenodo.15798647).

Department of Biophysics and Medical Informatics Educational and Scientific Centre Іnstitute of Biology and Medicine Taras Shevchenko Kyiv National University Kyiv 01601 Ukraine

Department of Cellular Computational and Integrative Biology The University of Trento Povo Trento 38123 Italy

Department of Physical Chemistry Faculty of Science Palacký University Olomouc Olomouc 771 46 Czech Republic

Department of Physics of Biological Systems Institute of Physics of The National Academy of Sciences of Ukraine Kyiv 03038 Ukraine

Institute of Molecular Biology and Genetics of The National Academy of Sciences of Ukraine Kyiv 03143 Ukraine

Institute of Organic Chemistry and Biochemistry Czech Academy of Sciences Prague 6 CZ 166 10 Czech Republic

Receptor AI Inc London N1 7GU United Kingdom

Zobrazit více v PubMed

Aggarwal R, Gupta A, Chelur V et al. DeepPocket: ligand binding site detection and segmentation using 3D convolutional neural networks. J Chem Inf Model 2022;62:5069–79. 10.1021/acs.jcim.1c00799 PubMed DOI

Ahuja S, Mukund S, Deng L et al. Structural basis of Nav1.7 inhibition by an isoform-selective small-molecule antagonist. Science 2015;350:aac5464. 10.1126/science.aac5464 PubMed DOI

An Y, Lim J, Glavatskikh M et al. In silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-Bot. Nat Commun 2024;15:5564. 10.1038/s41467-024-49892-9 PubMed DOI PMC

Capra JA, Laskowski RA, Thornton JM et al. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009;5:e1000585. 10.1371/journal.pcbi.1000585 PubMed DOI PMC

Durrant JD, Votapka L, Sørensen J et al. POVME 2.0: an enhanced tool for determining pocket shape and volume characteristics. J Chem Theory Comput 2014;10:5047–56. 10.1021/ct500381c PubMed DOI PMC

Ghersi D, Sanchez R. Improving accuracy and efficiency of blind protein‐ligand docking by focusing on predicted binding sites. Proteins 2009;74:417–24. 10.1002/prot.22154 PubMed DOI PMC

Graef J, Ehrt C, Rarey M. Binding site detection remastered: enabling fast, robust, and reliable binding site detection and descriptor calculation with DoGSite3. J Chem Inf Model 2023;63:3128–37. 10.1021/acs.jcim.3c00336 PubMed DOI

Jeevan K, Palistha S, Tayara H et al. PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction. J Cheminform 2024;16:66. 10.1186/s13321-024-00865-6 PubMed DOI PMC

Jiménez J, Doerr S, Martínez-Rosell G et al. DeepSite: protein-Binding site predictor using 3D-convolutional neural networks. Bioinformatics 2017;33:3036–42. 10.1093/bioinformatics/btx350 PubMed DOI

Kandel J, Tayara H, Chong KT. PUResNet: prediction of protein–ligand binding sites using deep residual neural network. J Cheminform 2021;13:65. 10.1186/s13321-021-00547-7 PubMed DOI PMC

Kim JJ, Gharpure A, Teng J et al. Shared structural mechanisms of general anaesthetics and benzodiazepines. Nature 2020;585:303–8. 10.1038/s41586-020-2654-5 PubMed DOI PMC

Krivák R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. J Cheminform 2018;10:39. 10.1186/s13321-018-0285-8 PubMed DOI PMC

Kruse AC, Kobilka BK, Gautam D et al. Muscarinic acetylcholine receptors: novel opportunities for drug development. Nat Rev Drug Discov 2014;13:549–60. 10.1038/nrd4295 PubMed DOI PMC

Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinformatics 2009;10:168. 10.1186/1471-2105-10-168 PubMed DOI PMC

Liang J, Edelsbrunner H, Woodward C. Anatomy of protein pockets and cavities: measurement of binding site geometry and implications for ligand design. Protein Sci 1998;7:1884–97. 10.1002/pro.5560070905 PubMed DOI PMC

Liu Y, Yang X, Gan J et al. CB-Dock2: improved protein–ligand blind docking by integrating cavity detection, docking and homologous template fitting. Nucleic Acids Res 2022;50:W159–64. 10.1093/nar/gkac394 PubMed DOI PMC

Murphy JM, Lucet IS, Hildebrand JM et al. Insights into the evolution of divergent nucleotide-binding mechanisms among pseudokinases revealed by crystal structures of human and mouse MLKL. Biochem J 2014;457:369–77. 10.1042/BJ20131270 PubMed DOI

Tian W, Chen C, Lei X et al. CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Res 2018;46:W363–7. 10.1093/nar/gky473 PubMed DOI PMC

Wei H, Wang W, Peng Z et al. Q-BioLiP: a comprehensive resource for quaternary structure-based protein–ligand interactions. Genomics Proteomics Bioinform 2024;22:qzae001. 10.1093/gpbjnl/qzae001 PubMed DOI PMC

Wei J, Wang X, Schuurmans D et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv, arXiv:2201.11903, 2023, preprint: not peer reviewed.

Xia C-Q, Pan X, Shen H-B. Protein-Ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data. Bioinformatics 2020;36:3018–27. 10.1093/bioinformatics/btaa110 PubMed DOI

Yao S, Yu D, Zhao J et al. Tree of thoughts: deliberate problem solving with large language models. arXiv, arXiv:2305.10601, 2023, preprint: not peer reviewed.

Yesylevskyy S. MolAR: memory‐safe library for analysis of MD simulations written in rust. J Comput Chem 2025;46:e27536. 10.1002/jcc.27536 PubMed DOI PMC

Zhang C, Zhang X, Freddolino L et al. BioLiP2: an updated structure database for biologically relevant Ligand-Protein interactions. Nucleic Acids Res 2024;52:D404–12. 10.1093/nar/gkad630 PubMed DOI PMC

Zhao Y, He S, Xing Y et al. A point cloud graph neural network for protein–ligand binding site prediction. Int J Mol Sci 2024;25:9280. 10.3390/ijms25179280 PubMed DOI PMC

Najít záznam

v BMČ

Citační ukazatele

Pouze přihlášení uživatelé

Leveraging large language models for literature-driven prioritization of protein binding pockets

Najít záznam

Citační ukazatele

Možnosti archivace