P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
1556217
Univerzita Karlova v Praze
SVV 260451
Univerzita Karlova v Praze
PubMed
30109435
PubMed Central
PMC6091426
DOI
10.1186/s13321-018-0285-8
PII: 10.1186/s13321-018-0285-8
Knihovny.cz E-zdroje
- Klíčová slova
- Binding site prediction, Ligand binding sites, Machine learning, Protein pockets, Protein surface descriptors, Random forests,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Ligand binding site prediction from protein structure has many applications related to elucidation of protein function and structure based drug discovery. It often represents only one step of many in complex computational drug design efforts. Although many methods have been published to date, only few of them are suitable for use in automated pipelines or for processing large datasets. These use cases require stability and speed, which disqualifies many of the recently introduced tools that are either template based or available only as web servers. RESULTS: We present P2Rank, a stand-alone template-free tool for prediction of ligand binding sites based on machine learning. It is based on prediction of ligandability of local chemical neighbourhoods that are centered on points placed on the solvent accessible surface of a protein. We show that P2Rank outperforms several existing tools, which include two widely used stand-alone tools (Fpocket, SiteHound), a comprehensive consensus based tool (MetaPocket 2.0), and a recent deep learning based method (DeepSite). P2Rank belongs to the fastest available tools (requires under 1 s for prediction on one protein), with additional advantage of multi-threaded implementation. CONCLUSIONS: P2Rank is a new open source software package for ligand binding site prediction from protein structure. It is available as a user-friendly stand-alone command line program and a Java library. P2Rank has a lightweight installation and does not depend on other bioinformatics tools or large structural or sequence databases. Thanks to its speed and ability to make fully automated predictions, it is particularly well suited for processing large datasets or as a component of scalable structural bioinformatics pipelines.
Zobrazit více v PubMed
Konc J, Janežiž D. Binding site comparison for function prediction and pharmaceutical discovery. Curr Opin Struct Biol. 2014;25:34–9. doi: 10.1016/j.sbi.2013.11.012. PubMed DOI
Zheng X, Gan L, Wang E, Wang J. Pocket-based drug design: exploring pocket space. AAPS J. 2013;15:228–241. doi: 10.1208/s12248-012-9426-6. PubMed DOI PMC
Pérot S, Sperandio O, Miteva M, Camproux A, Villoutreix B. Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Drug Discov Today. 2010;15(15–16):656–667. doi: 10.1016/j.drudis.2010.05.015. PubMed DOI
Tibaut T, Borišek J, Novič M, Turk D. Comparison of in silico tools for binding site prediction applied for structure-based design of autolysin inhibitors. SAR QSAR Environ Res. 2016;27(7):573–587. doi: 10.1080/1062936X.2016.1217271. PubMed DOI
Xie L, Xie L, Bourne PE. Structure-based systems biology for analyzing off-target binding. Curr Opin Struct Biol. 2011;21(2):189–99. doi: 10.1016/j.sbi.2011.01.004. PubMed DOI PMC
Grove Laurie E, Sandor Vajda DK. Computational methods to support fragment-based drug discovery. In: Fagerberg J, Mowery DC, Nelson RR, editors. Fragment-based drug discovery: lessons and outlook. Weinheim: Wiley; 2016. pp. 197–222.
Laurie A, Jackson R. Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. Curr Protein Peptide Sci. 2006;7(5):395–406. doi: 10.2174/138920306778559386. PubMed DOI
Feinstein WP, Brylinski M. Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets. J Cheminform. 2015;7(1):1–10. doi: 10.1186/s13321-015-0067-5. PubMed DOI PMC
Lionta E, Spyrou G, Cournia DKV. Zoe: structure-based virtual screening for drug discovery: principles, applications and recent advances. Curr Top Med Chem. 2014;14(16):1923–1938. doi: 10.2174/1568026614666140929124445. PubMed DOI PMC
Schomburg K, Bietz S, Briem H, Henzler A, Urbaczek S, Rarey M. Facing the challenges of structure-based target prediction by inverse virtual screening. J Chem Inf Model. 2014;54(6):1676–86. doi: 10.1021/ci500130e. PubMed DOI
Degac J, Winter U, Helms V. Graph-based clustering of predicted ligand-binding pockets on protein surfaces. J Chem Inf Model. 2015;55(9):1944–1952. doi: 10.1021/acs.jcim.5b00045. PubMed DOI
Meyers J, Brown N, Blagg J. Mapping the 3D structures of small molecule binding sites. J Cheminform. 2016;8(1):70. doi: 10.1186/s13321-016-0180-0. DOI
Monzon AM, Zea DJ, Fornasari MS, Saldaño TE, Fernandez-Alberti S, Tosatto SCE, Parisi G. Conformational diversity analysis reveals three functional mechanisms in proteins. PLOS Comput Biol. 2017;13(2):1–18. doi: 10.1371/journal.pcbi.1005398. PubMed DOI PMC
Shen Q, Cheng F, Song H, Lu W, Zhao J, An X, Liu M, Chen G, Zhao Z, Zhang J. Proteome-scale investigation of protein allosteric regulation perturbed by somatic mutations in 7000 cancer genomes. Am J Hum Genet. 2017;100(1):5–20. doi: 10.1016/j.ajhg.2016.09.020. PubMed DOI PMC
Bhagavat R, Sankar S, Srinivasan N, Chandra N. An augmented pocketome: detection and analysis of small-molecule binding pockets in proteins of known 3D structure. Structure. 2018;26(3):499–5122. doi: 10.1016/j.str.2018.02.001. PubMed DOI
Hussein H, Borrel A, Geneix C, Petitjean M, Regad L, Camproux A. PockDrug-Server: a new web server for predicting pocket druggability on holo and apo proteins. Nucleic Acids Res. 2015;43(W1):436–442. doi: 10.1093/nar/gkv462. PubMed DOI PMC
Huang W, Lu S, Huang Z, Liu X, Mou L, Luo Y, Zhao Y, Liu Y, Chen Z, Hou T, Zhang J. Allosite: a method for predicting allosteric sites. Bioinformatics. 2013;29(18):2357–2359. doi: 10.1093/bioinformatics/btt399. PubMed DOI
Le Guilloux V, Schmidtke P, Tuffery P. Fpocket: an open source platform for ligand pocket detection. BMC Bioinform. 2009;10(1):168. doi: 10.1186/1471-2105-10-168. PubMed DOI PMC
Henrich S, Outi S, Huang B, Rippmann F, Cruciani G, Wade R. Computational approaches to identifying and characterizing protein binding sites for ligand design. J Mol Recognit JMR. 2010;23(2):209–219. PubMed
Leis S, Schneider S, Zacharias M. In silico prediction of binding sites on proteins. Curr Med Chem. 2010;17(15):1550–1562. doi: 10.2174/092986710790979944. PubMed DOI
Chen K, Mizianty M, Gao J, Kurgan L. A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds. Structure (London, England : 1993) 2011;19(5):613–621. doi: 10.1016/j.str.2011.02.015. PubMed DOI
Fauman EB, Rai BK, Huang ES. Structure-based druggability assessment-identifying suitable targets for small molecule therapeutics. Curr Opin Chem Biol. 2011;15(4):463–468. doi: 10.1016/j.cbpa.2011.05.020. PubMed DOI
Roche DB, Brackenridge DA, McGuffin LJ. Proteins and their interacting partners: an introduction to protein-ligand binding site prediction methods. Int J Mol Sci. 2015;16(12):29829–29842. doi: 10.3390/ijms161226202. PubMed DOI PMC
Broomhead NK, Soliman ME. Can we rely on computational predictions to correctly identify ligand binding sites on novel protein drug targets? Assessment of binding site prediction methods and a protocol for validation of predicted binding sites. Cell Biochem Biophys. 2017;75(1):15–23. doi: 10.1007/s12013-016-0769-y. PubMed DOI
Simões T, Lopes D, Dias S, Fernandes F, Pereira J, Jorge J, Bajaj C, Gomes A (2017) Geometric detection algorithms for cavities on protein surfaces in molecular graphics: a survey. In: Computer graphics forum PubMed PMC
Krivak R, Hoksza D. Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. J Cheminform. 2015;7(1):12. doi: 10.1186/s13321-015-0059-5. PubMed DOI PMC
Zhang Z, Li Y, Lin B, Schroeder M, Huang B. Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics (Oxford, England) 2011;27(15):2083–2088. doi: 10.1093/bioinformatics/btr331. PubMed DOI
Ghersi D, Sanchez R. EasyMIFS and SiteHound: a toolkit for the identification of ligand-binding sites in protein structures. Bioinformatics (Oxford, England) 2009;25(23):3185–3186. doi: 10.1093/bioinformatics/btp562. PubMed DOI PMC
Kauffman C, Karypis G. Librus: combined machine learning and homology information for sequence-based ligand-binding residue prediction. Bioinformatics (Oxford, England) 2009;25(23):3099–107. doi: 10.1093/bioinformatics/btp561. PubMed DOI PMC
Qiu Z, Wang X. Improved prediction of protein ligand-binding sites using random forests. Protein Peptide Lett. 2011;18(12):1212–1218. doi: 10.2174/092986611797642788. PubMed DOI
Chen P, Huang JZ, Gao X. Ligandrfs: random forest ensemble to identify ligand-binding residues from sequence information alone. BMC Bioinform. 2014;15(Suppl 15):4. doi: 10.1186/1471-2105-15-S15-S4. PubMed DOI PMC
Jian JW, Elumalai P, Pitti T, Wu CY, Tsai KC, Chang JY, Peng HP, Yang AS. Predicting ligand binding sites on protein surfaces by 3-Dimensional probability density distributions of interacting atoms. PLoS ONE. 2016;11(8):0160315. PubMed PMC
Jiménez J, Doerr S, Martínez-Rosell G, Rose AS, De Fabritiis G. Deepsite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics. 2017;33(19):3036–3042. doi: 10.1093/bioinformatics/btx350. PubMed DOI
Nayal M, Honig B. On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins. 2006;63(4):892–906. doi: 10.1002/prot.20897. PubMed DOI
Halgren TA. Identifying and characterizing binding sites and assessing druggability. J Chem Inf Model. 2009;49(2):377–389. doi: 10.1021/ci800324m. PubMed DOI
Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol. 2009;5(12):1000585. doi: 10.1371/journal.pcbi.1000585. PubMed DOI PMC
Wass MN, Kelley LA, Sternberg MJ. 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res. 2017;38(Web Server issue):469–73. PubMed PMC
Yu J, Zhou Y, Tanaka I, Yao M. Roll: a new algorithm for the detection of protein pockets and cavities with a rolling probe sphere. Bioinformatics. 2010;26(1):46–52. doi: 10.1093/bioinformatics/btp599. PubMed DOI
Volkamer A, Griewel A, Grombacher T, Rarey M. Analyzing the topology of active sites: on the prediction of pockets and subpockets. J Chem Inf Model. 2010;50(11):2041–52. doi: 10.1021/ci100241y. PubMed DOI
Ngan CH, Hall DR, Zerbe B, Grove LE, Kozakov D, Vajda S. FTSite: high accuracy detection of ligand binding sites on unbound protein structures. Bioinformatics. 2012;28(2):286–7. doi: 10.1093/bioinformatics/btr651. PubMed DOI PMC
Xie Z, Hwang M. Ligand-binding site prediction using ligand-interacting and binding site-enriched protein triangles. Bioinformatics. 2012;28(12):1579–1585. doi: 10.1093/bioinformatics/bts182. PubMed DOI
Roy A, Yang J, Zhang Y. Cofactor: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res. 2012;40(W1):471–477. doi: 10.1093/nar/gks372. PubMed DOI PMC
Yang J, Roy A, Zhang Y. Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment. Bioinformatics. 2013;29(20):2588–2595. doi: 10.1093/bioinformatics/btt447. PubMed DOI PMC
Lee HS, Im W. Ligand binding site detection by local structure alignment and its performance complementarity. J Chem Inf Model. 2013;53(9):2462–2470. doi: 10.1021/ci4003602. PubMed DOI PMC
Brylinski M, Feinstein WP. eFindSite: improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands. J Comput Aided Mol Des. 2013;27(6):551–567. doi: 10.1007/s10822-013-9663-5. PubMed DOI
Heo L, Shin W, Lee M, Seok C. GalaxySite: ligand-binding-site prediction by using molecular docking. Nucleic Acids Res. 2014;42(W1):210–214. doi: 10.1093/nar/gku321. PubMed DOI PMC
Viet Hung L, Caprari S, Bizai M, Toti D, Polticelli F. Libra: ligand binding site recognition application. Bioinformatics. 2015;31(24):4020–4022. PubMed
Gao J, Zhang Q, Liu M, Zhu L, Wu D, Cao Z, Zhu R. bSiteFinder, an improved protein-binding sites prediction server based on structural alignment: more accurate and less time-consuming. J Cheminform. 2016;8(1):38. doi: 10.1186/s13321-016-0149-z. PubMed DOI PMC
Krivák R, Hoksza D (2015) In: Dediu A-H, Hernández-Quiroz F, Martín-Vide C, Rosenblueth AD (eds) P2RANK: knowledge-based ligand binding site prediction using aggregated local features. Springer, Cham, pp 41–52
Huang B, Schroeder M. Ligsitecsc: predicting ligand binding sites using the connolly surface and degree of conservation. BMC Struct Biol. 2006;6(1):19. doi: 10.1186/1472-6807-6-19. PubMed DOI PMC
Laskowski RA, Watson JD, Thornton JM. Profunc: a server for predicting protein function from 3D structure. Nucleic Acids Res. 2005;33:89–93. doi: 10.1093/nar/gki414. PubMed DOI PMC
Brylinski M, Skolnick J. A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci USA. 2008;105(1):129–134. doi: 10.1073/pnas.0707684105. PubMed DOI PMC
Skolnick J, Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Briefings Bioinform. 2009;10(4):378–391. doi: 10.1093/bib/bbp017. PubMed DOI PMC
Lee J, Freddolino PL, Zhang Y (2017) In: Rigden DJ (ed) Ab initio protein structure prediction. Springer, Dordrecht, pp 3–35
Karanicolas J, Corn J, et al. A de novo protein binding pair by computational design and directed evolution. Mol Cell. 2011;42(2):250–260. doi: 10.1016/j.molcel.2011.03.010. PubMed DOI PMC
Damborsky J, Brezovsky J. Computational tools for designing and engineering enzymes. Curr Opin Chem Biol. 2014;19(Supplement C):8–16. doi: 10.1016/j.cbpa.2013.12.003. PubMed DOI
Wang M, Zhao H (2016) In: Stoddard BL (ed) Combined and iterative use of computational design and directed evolution for protein–ligand binding design. Springer, New York, pp 139–153 PubMed
Di Pietro O, Juárez-Jiménez J, Muñoz-Torrero D, Laughton CA, Luque FJ. Unveiling a novel transient druggable pocket in bace-1 through molecular simulations: conformational analysis and binding mode of multisite inhibitors. PLOS ONE. 2017;12(5):1–22. doi: 10.1371/journal.pone.0177683. PubMed DOI PMC
Gallo Cassarino T, Bordoli L, Schwede T. Assessment of ligand binding site predictions in CASP10. Proteins Struct Funct Bioinform. 2014;82:154–163. doi: 10.1002/prot.24495. PubMed DOI PMC
Haas J, Roth S, Arnold K, Kiefer F, Schmidt T, Bordoli L, Schwede T. The protein model portal-a comprehensive resource for protein structure and model information. Database. 2013;2013:031. doi: 10.1093/database/bat031. PubMed DOI PMC
Ma B, Shatsky M, Wolfson HJ, Nussinov R. Multiple diverse ligands binding at a single protein site: a matter of pre-existing populations. Protein Sci. 2002;11(2):184–197. doi: 10.1110/ps.21302. PubMed DOI PMC
Schmidtke P, Axel B, Luque F, Barril X. MDpocket: open-source cavity detection and characterization on molecular dynamics trajectories. Bioinformatics (Oxford, England) 2011;27(23):3276–3285. doi: 10.1093/bioinformatics/btr550. PubMed DOI
Stank A, Kokh DB, Horn M, Sizikova E, Neil R, Panecka J, Richter S, Wade RC. Trapp webserver: predicting protein binding site flexibility and detecting transient binding pockets. Nucleic Acids Res. 2017;45(W1):325–330. doi: 10.1093/nar/gkx277. PubMed DOI PMC
Schrödinger LLC (2015) The PyMOL molecular graphics system, version 1.8
Desaphy J, Bret G, Rognan D, Kellenberger E. sc-PDB: a 3D-database of ligandable binding sites-10 years on. Nucleic Acids Res. 2015;43(D1):399–404. doi: 10.1093/nar/gku928. PubMed DOI PMC
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR. Protein-ligand scoring with convolutional neural networks. J Chem Inf Model. 2017;57(4):942–957. doi: 10.1021/acs.jcim.6b00740. PubMed DOI PMC
Ragoza M, Turner L, Koes DR (2017) Ligand pose optimization with atomic grid-based convolutional neural networks. ArXiv e-prints
Schmidtke P (2011) Protein-ligand binding sites. Identification, characterization and interrelations. Ph.D. thesis, University of Barcelona
Eisenhaber F, Lijnzaad P, Argos P, Sander C, Scharf M. The double cubic lattice method: Efficient approaches to numerical integration of surface area and volume and to dot surface contouring of molecular assemblies. J Comput Chem. 1995;16(3):273–284. doi: 10.1002/jcc.540160303. DOI
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The chemistry development kit (CDK): An open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci. 2003;43(2):493–500. doi: 10.1021/ci025584y. PubMed DOI PMC
Morita M, Nakamura S, Shimizu K. Highly accurate method for ligand-binding site prediction in unbound state (apo) protein structures. Proteins. 2008;73(2):468–79. doi: 10.1002/prot.22067. PubMed DOI
Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157(1):105–132. doi: 10.1016/0022-2836(82)90515-0. PubMed DOI
Desaphy J, Azdimousa K, Kellenberger E, Rognan D. Comparison and druggability prediction of protein-ligand binding sites from pharmacophore-annotated cavity shapes. J Chem Inf Model. 2012;52(8):2287–2299. doi: 10.1021/ci300184x. PubMed DOI
Kapcha LH, Rossky PJ. A simple atomic-level hydrophobicity scale reveals protein interfacial structure. J Mol Biol. 2014;426(2):484–498. doi: 10.1016/j.jmb.2013.09.039. PubMed DOI
Khazanov NA, Carlson HA. Exploring the composition of protein-ligand binding sites on a large scale. PLoS Comput Biol. 2013;9(11):1003321. doi: 10.1371/journal.pcbi.1003321. PubMed DOI PMC
Pintar A, Carugo O, Pongor S. Cx, an algorithm that identifies protruding atoms in proteins. Bioinformatics. 2002;18(7):980–984. doi: 10.1093/bioinformatics/18.7.980. PubMed DOI
Murzin AG, Brenner SE, Hubbard T, Chothia C. Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536–540. PubMed
Hartshorn M, Verdonk M, Chessari G, Brewerton S, Mooij W, Mortenson P, Murray C. Diverse, high-quality test set for the validation of protein-ligand docking performance. J Med Chem. 2007;50(4):726–741. doi: 10.1021/jm061277y. PubMed DOI
Schmidtke P, Souaille C, Estienne F, Baurin N, Kroemer R. Large-scale comparison of four binding site detection algorithms. J Chem Inf Model. 2010;50(12):2191–200. doi: 10.1021/ci1000289. PubMed DOI
Hu L, Benson ML, Smith RD, Lerner MG, Carlson HA. Binding moad (mother of all databases) Proteins Struct Funct Bioinform. 2005;60(3):333–340. doi: 10.1002/prot.20512. PubMed DOI
Zhu H, Pisabarro MT. MSPocket: an orientation-independent algorithm for the detection of ligand binding pockets. Bioinformatics. 2011;27(3):351–358. doi: 10.1093/bioinformatics/btq672. PubMed DOI
CryptoBench: cryptic protein-ligand binding sites dataset and benchmark
A computational workflow for analysis of missense mutations in precision oncology
PrankWeb: a web server for ligand binding site prediction and visualization