QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

. 2020 Jun 05 ; 12 (1) : 41. [epub] 20200605

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid33431016

Grantová podpora
703543 H2020 Marie Skłodowska-Curie Actions
LM2015063 Ministry of Education, Youth and Sports of the Czech Republic

Odkazy

PubMed 33431016
PubMed Central PMC7339533
DOI 10.1186/s13321-020-00444-5
PII: 10.1186/s13321-020-00444-5
Knihovny.cz E-zdroje

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using Ki, Kd, IC50 and EC50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .

Zobrazit více v PubMed

Costello JC, Heiser LM, Georgii E, et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014;32:1202–1212. doi: 10.1038/nbt.2877. PubMed DOI PMC

Eduati F, Mangravite LM, Wang T, et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat Biotechnol. 2015;33:933–940. doi: 10.1038/nbt.3299. PubMed DOI PMC

Cortés-Ciriano I, Ain QU, Subramanian V, et al. Polypharmacology modelling using proteochemometrics: recent developments and future prospects. Med Chem Commun. 2015;6:24. doi: 10.1039/C4MD00216D. DOI

Menden MP, Iorio F, Garnett M, et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE. 2013;8:e61318. doi: 10.1371/journal.pone.0061318. PubMed DOI PMC

Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32:1466–1474. doi: 10.1002/jcc.21707. PubMed DOI

Todeschini R, Consonni V. Handbook of molecular descriptors. Weinheim: Wiley; 2000.

Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2:3204–3218. doi: 10.1039/B409813G. PubMed DOI

Johnson MA, Maggiora GM, American Chemical Society . Concepts and applications of molecular similarity. New York: Wiley; 1990.

Stumpfe D, Bajorath J. Exploring activity cliffs in medicinal chemistry. J Med Chem. 2012;55:2932–2942. doi: 10.1021/jm201706b. PubMed DOI

Petrone PM, Simms B, Nigsch F, et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem Biol. 2012;7:1399–1409. doi: 10.1021/cb3001028. PubMed DOI

Mason JS. Burger’s medicinal chemistry and drug discovery. Hoboken: Wiley; 2010. Use of biological fingerprints versus structure/chemotypes to describe molecules; pp. 481–504.

Kauvar LM, Higgins DL, Villar HO, et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem Biol. 1995;2:107–118. doi: 10.1016/1074-5521(95)90283-X. PubMed DOI

Martin EJ, Polyakov VR, Zhu X-W, et al. All-Assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC 50 s for 8558 novartis assays. J Chem Inf Model. 2019;59:4450–4459. doi: 10.1021/acs.jcim.9b00375. PubMed DOI

Briem H, Lessel UF (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. In: Perspectives in drug discovery and design. Kluwer Academic Publishers, New York, pp 231–244

Martin EJ, Polyakov VR, Tian L, Perez RC. Profile-QSAR 2.0: kinase virtual screening accuracy comparable to four-concentration IC50s for realistically novel compounds. J Chem Inf Model. 2017;57:2077–2088. doi: 10.1021/acs.jcim.7b00166. PubMed DOI

Nidhi Glick M, Davies JW, Jenkins JL. Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model. 2006;46:1124–1133. doi: 10.1021/ci060003g. PubMed DOI

Lessel UF, Briem H. Flexsim-X: a method for the detection of molecules with similar biological activity. J Chem Inf Comput Sci. 2002;40:246–253. doi: 10.1021/ci990439e. PubMed DOI

Koutsoukas A, Lowe R, KalantarMotamedi Y, et al. In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt Window. J Chem Inf Model. 2013;53:1957–1966. doi: 10.1021/ci300435j. PubMed DOI

Koutsoukas A, Simms B, Kirchmair J, et al. From in silico target prediction to multi-target drug design: current databases, methods and applications. J Proteomics. 2011;74:2554–2574. doi: 10.1016/j.jprot.2011.05.011. PubMed DOI

Lounkine E, Keiser MJ, Whitebread S, et al. Large-scale prediction and testing of drug activity on side-effect targets. Nature. 2012;486:361–367. doi: 10.1038/nature11159. PubMed DOI PMC

Cheng T, Li Q, Wang Y, Bryant SH. Identifying compound-target associations by combining bioactivity profile similarity search and public databases mining. J Chem Inf Model. 2011;51:2440–2448. doi: 10.1021/ci200192v. PubMed DOI PMC

Peragovics Á, Simon Z, Brandhuber I, et al. Contribution of 2D and 3D structural features of drug molecules in the prediction of drug profile matching. J Chem Inf Model. 2012;52:1733–1744. doi: 10.1021/ci3001056. PubMed DOI

Peragovics Á, Simon Z, Tombor L, et al. Virtual affinity fingerprints for target fishing: a new application of drug profile matching. J Chem Inf Model. 2013;53:103–113. doi: 10.1021/ci3004489. PubMed DOI

Simon Z, Peragovics Á, Vigh-Smeller M, et al. Drug effect prediction by polypharmacology-based interaction profiling. J Chem Inf Model. 2012;52:134–145. doi: 10.1021/ci2002022. PubMed DOI

Poroikov V, Filimonov D, Lagunin A, et al. PASS: identification of probable targets and mechanisms of toxicity. SAR QSAR Environ Res. 2007;18:101–110. doi: 10.1080/10629360601054032. PubMed DOI

Fliri AF, Loging WT, Thadeio PF, Volkmann RA. Biological spectra analysis: linking biological activity profiles to molecular structure. Proc Natl Acad Sci USA. 2005;102:261–266. doi: 10.1073/pnas.0407790101. PubMed DOI PMC

Martin E, Mukherjee P, Sullivan D, Jansen J. Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. J Chem Inf Model. 2011;51:1942–1956. doi: 10.1021/ci1005004. PubMed DOI

Bender A, Jenkins JL, Glick M, et al. “Bayes affinity fingerprints” Improve retrieval rates in virtual screening and define orthogonal bioactivity space: when are multitarget drugs a feasible concept? J Chem Inf Model. 2006;46:2445–2456. doi: 10.1021/ci600197y. PubMed DOI

Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform. 2020;12:39. doi: 10.1186/s13321-020-00443-6. PubMed DOI PMC

Huang R, Xia M, Sakamuru S, et al. Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nat Commun. 2016;7:1–10. doi: 10.1038/ncomms10425. PubMed DOI PMC

Shoemaker RH. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer. 2006;6:813–823. doi: 10.1038/nrc1951. PubMed DOI

Cherkasov A, Muratov EN, Fourches D, et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57:4977–5010. doi: 10.1021/jm4004285. PubMed DOI PMC

Barretina J, Caponigro G, Stransky N, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. PubMed DOI PMC

de Waal L, Lewis TA, Rees MG, et al. Identification of cancer-cytotoxic modulators of PDE3A by predictive chemogenomics. Nat Chem Biol. 2016;12:102–108. doi: 10.1038/nchembio.1984. PubMed DOI PMC

Geeleher P, Cox NJ, Huang RS. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 2014;15:R47. doi: 10.1186/gb-2014-15-3-r47. PubMed DOI PMC

Netzeva TI, Worth A, Aldenberg T, et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern Lab Anim. 2005;33:155–173. doi: 10.1177/026119290503300209. PubMed DOI

Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50:742–754. doi: 10.1021/ci100050t. PubMed DOI

Nowotka M, Papadatos G, Davies M, et al Want Drugs? Use Python. 2016, arXiv160700378 arXiv.org ePrint Arch. https://arxiv.org/abs/160700378. Accessed 10 July 2018

Davies M, Nowotka M, Papadatos G, et al. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 2015;43:W612–W620. doi: 10.1093/nar/gkv352. PubMed DOI PMC

Gaulton A, Bellis LJ, Bento AP, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2011;40:1100–1107. doi: 10.1093/nar/gkr777. PubMed DOI PMC

Cortés-Ciriano I, Bender A. How consistent are publicly reported cytotoxicity data? Large-scale statistical analysis of the concordance of public independent cytotoxicity measurements. ChemMedChem. 2015;11:57–71. doi: 10.1002/cmdc.201500424. PubMed DOI

Cortés-Ciriano I, Bender A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform. 2019;11:41. doi: 10.1186/s13321-019-0364-5. PubMed DOI PMC

Cortés-Ciriano I, Bender A. Reliable prediction errors for deep neural networks using test-time dropout. J Chem Inf Model. 2019;59:3330–3339. doi: 10.1021/acs.jcim.9b00297. PubMed DOI

Cortés-Ciriano I, Bender A. Deep confidence: a computationally efficient framework for calculating reliable prediction errors for deep neural networks. J Chem Inf Model. 2019;59:1269–1281. doi: 10.1021/acs.jcim.8b00542. PubMed DOI

Fourches D, Muratov E, Tropsha A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model. 2010;50:1189–1204. doi: 10.1021/ci100176x. PubMed DOI PMC

O’Boyle NM, Sayle RA. Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform. 2016;8:36. doi: 10.1186/s13321-016-0148-0. PubMed DOI PMC

Roy K, Kar S, Das RN. Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment. Cham: Springer; 2015. Selected statistical methods in QSAR; pp. 191–229.

Norinder U, Carlsson L, Boyer S, et al. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. J Chem Inf Model. 2014;54:1596–1603. doi: 10.1021/ci5001168. PubMed DOI

Landrum G RDKit: open-source cheminformatics. https://www.rdkit.org/. Accessed 12 Jan 2017

Bender A, Jenkins JL, Scheiber J, et al. How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J Chem Inf Model. 2009;49:108–119. doi: 10.1021/ci800249s. PubMed DOI

Koutsoukas A, Paricharak S, Galloway WRJD, et al. How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. J Chem Inf Model. 2013;54:230–242. doi: 10.1021/ci400469u. PubMed DOI

Jones E, Oliphant E, Peterson P et al (2001) SciPy: open source scientific tools for python. http://www.scipy.org/

Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–2830.

Sheridan RP. Using random forest to model the domain applicability of another random forest model. J Chem Inf Model. 2013;53:2837–2850. doi: 10.1021/ci400482e. PubMed DOI

Sheridan RP. Three useful dimensions for domain applicability in QSAR models using random forest. J Chem Inf Model. 2012;52:814–823. doi: 10.1021/ci300004n. PubMed DOI

Cortés-Ciriano I, van Westen GJP, Bouvier G, et al. Improved large-scale prediction of growth inhibition patterns on the NCI60 cancer cell-line panel. Bioinformatics. 2016;32:85–95. doi: 10.1093/bioinformatics/btv529. PubMed DOI PMC

Winer B, Brown D, Michels K. Statistical principles in experimental design. 3. New York: McGraw-Hill; 1991.

Kosub S. A note on the triangle inequality for the Jaccard distance. Pattern Recognit Lett. 2019;120:36–38. doi: 10.1016/j.patrec.2018.12.007. DOI

Patterson DE, Cramer RD, Ferguson AM, et al. Neighborhood behavior: a useful concept for validation of “molecular diversity” descriptors. J Med Chem. 1996;39:3049–3059. doi: 10.1021/jm960290n. PubMed DOI

Kalliokoski T, Kramer C, Vulpetti A, Gedeck P. Comparability of mixed IC50 data—a statistical analysis. PLoS ONE. 2013;8:e61007. doi: 10.1371/journal.pone.0061007. PubMed DOI PMC

Rücker C, Rücker G, Meringer M. y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model. 2007 doi: 10.1021/CI700157B. PubMed DOI

Cortés-Ciriano I, Bender A, Malliavin TE, et al. Comparing the influence of simulated experimental errors on 12 machine learning algorithms in bioactivity modeling using 12 diverse data sets. J Chem Inf Model. 2015;55:1413–1425. doi: 10.1021/acs.jcim.5b00101. PubMed DOI

Cortés-Ciriano I, Bender A. Improved chemical structure–activity modeling through data augmentation. J Chem Inf Model. 2015;55:2682–2692. doi: 10.1021/acs.jcim.5b00570. PubMed DOI

Kuz’min VE, Polishchuk PG, Artemenko AG, Andronati SA. Interpretation of QSAR models based on random forest methods. Mol Inform. 2011;30:593–603. doi: 10.1002/minf.201000173. PubMed DOI

Safikhani Z, Freeman M, Smirnov P, et al. Revisiting inconsistency in large pharmacogenomic studies. F1000Research. 2017;5:2333. doi: 10.12688/f1000research.9611.3. PubMed DOI PMC

Haibe-Kains B, El-Hachem N, Birkbak NJ, et al. Inconsistency in large pharmacogenomic studies. Nature. 2013;504:389–393. doi: 10.1038/nature12831. PubMed DOI PMC

Fallahi-Sichani M, Honarnejad S, Heiser LM, et al. Metrics other than potency reveal systematic variation in responses to cancer drugs. Nat Chem Biol. 2013;9:708–714. doi: 10.1038/nchembio.1337. PubMed DOI PMC

Hafner M, Niepel M, Chung M, Sorger PK. Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs. Nat Meth. 2016;13:521–527. doi: 10.1038/nmeth.3853. PubMed DOI PMC

Consortium TG of DS in CCLE. Consortium TG of DS in CCLE. Stransky N, et al. Pharmacogenomic agreement between two cancer cell line data sets. Nature. 2015;528:84–87. doi: 10.1038/nature15736. PubMed DOI PMC

Módos D, Bulusu KC, Fazekas D, et al. Neighbours of cancer-related proteins have key influence on pathogenesis and could increase the drug target space for anticancer therapies. NPJ Syst Biol Appl. 2017;3:2. doi: 10.1038/s41540-017-0003-6. PubMed DOI PMC

Garnett MMJ, Edelman EEJ, Heidorn SJS, et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature. 2012;483:570–575. doi: 10.1038/nature11005. PubMed DOI PMC

Rodríguez-Antona C, Taron M. Pharmacogenomic biomarkers for personalized cancer treatment. J Intern Med. 2015;277:201–217. doi: 10.1111/joim.12321. PubMed DOI

Konecny GE, Kristeleit RS. PARP inhibitors for BRCA1/2-mutated and sporadic ovarian cancer: current practice and future directions. Br J Cancer. 2016;115:1157–1173. doi: 10.1038/bjc.2016.311. PubMed DOI PMC

Bitler BG, Watson ZL, Wheeler LJ, Behbakht K. PARP inhibitors: clinical utility and possibilities of overcoming resistance. Gynecol Oncol. 2017;147:695–704. doi: 10.1016/J.YGYNO.2017.10.003. PubMed DOI PMC

Underhill C, Toulmonde M, Bonnefoi H. A review of PARP inhibitors: from bench to bedside. Ann Oncol. 2011;22:268–279. doi: 10.1093/annonc/mdq322. PubMed DOI

Curtin N. PARP inhibitors for anticancer therapy. Biochem Soc Trans. 2014;42:82–88. doi: 10.1042/BST20130187. PubMed DOI

Nguyen L, Naulaerts S, Bomane A, et al (2018) Machine learning models to predict in vivo drug response via optimal dimensionality reduction of tumour molecular profiles. bioRxiv 277772. 10.1101/277772

Gulhan DC, Lee JJ-K, Melloni GEM, et al. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat Genet. 2019;51:912–919. doi: 10.1038/s41588-019-0390-2. PubMed DOI

Dry JR, Yang M, Saez-Rodriguez J. Looking beyond the cancer cell for effective drug combinations. Genome Med. 2016;8:125. doi: 10.1186/s13073-016-0379-8. PubMed DOI PMC

Bulusu KC, Guha R, Mason DJ, et al. Modelling of compound combination effects and applications to efficacy and toxicity: state-of-the-art, challenges and perspectives. Drug Discov Today. 2015;21:225–238. doi: 10.1016/j.drudis.2015.09.003. PubMed DOI

Sidorov P, Naulaerts S, Ariey-Bonnet J, et al (2018) Predicting synergism of cancer drug combinations using NCI-ALMANAC data. bioRxiv 504076. 10.1101/504076 PubMed PMC

Menden MP, Wang D, Mason MJ, et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat Commun. 2019;10:2674. doi: 10.1038/s41467-019-09799-2. PubMed DOI PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Profiling and analysis of chemical compounds using pointwise mutual information

. 2021 Jan 10 ; 13 (1) : 3. [epub] 20210110

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...