QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction
Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
703543
H2020 Marie Skłodowska-Curie Actions
LM2015063
Ministry of Education, Youth and Sports of the Czech Republic
PubMed
33431016
PubMed Central
PMC7339533
DOI
10.1186/s13321-020-00444-5
PII: 10.1186/s13321-020-00444-5
Knihovny.cz E-zdroje
- Klíčová slova
- Affinity fingerprints, Bioactivity modeling, ChEMBL, Cytotoxicity, Drug sensitivity, Drug sensitivity prediction, QSAR,
- Publikační typ
- časopisecké články MeSH
Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using Ki, Kd, IC50 and EC50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65-0.95 pIC50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .
Zobrazit více v PubMed
Costello JC, Heiser LM, Georgii E, et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014;32:1202–1212. doi: 10.1038/nbt.2877. PubMed DOI PMC
Eduati F, Mangravite LM, Wang T, et al. Prediction of human population responses to toxic compounds by a collaborative competition. Nat Biotechnol. 2015;33:933–940. doi: 10.1038/nbt.3299. PubMed DOI PMC
Cortés-Ciriano I, Ain QU, Subramanian V, et al. Polypharmacology modelling using proteochemometrics: recent developments and future prospects. Med Chem Commun. 2015;6:24. doi: 10.1039/C4MD00216D. DOI
Menden MP, Iorio F, Garnett M, et al. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE. 2013;8:e61318. doi: 10.1371/journal.pone.0061318. PubMed DOI PMC
Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32:1466–1474. doi: 10.1002/jcc.21707. PubMed DOI
Todeschini R, Consonni V. Handbook of molecular descriptors. Weinheim: Wiley; 2000.
Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2:3204–3218. doi: 10.1039/B409813G. PubMed DOI
Johnson MA, Maggiora GM, American Chemical Society . Concepts and applications of molecular similarity. New York: Wiley; 1990.
Stumpfe D, Bajorath J. Exploring activity cliffs in medicinal chemistry. J Med Chem. 2012;55:2932–2942. doi: 10.1021/jm201706b. PubMed DOI
Petrone PM, Simms B, Nigsch F, et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem Biol. 2012;7:1399–1409. doi: 10.1021/cb3001028. PubMed DOI
Mason JS. Burger’s medicinal chemistry and drug discovery. Hoboken: Wiley; 2010. Use of biological fingerprints versus structure/chemotypes to describe molecules; pp. 481–504.
Kauvar LM, Higgins DL, Villar HO, et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem Biol. 1995;2:107–118. doi: 10.1016/1074-5521(95)90283-X. PubMed DOI
Martin EJ, Polyakov VR, Zhu X-W, et al. All-Assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC 50 s for 8558 novartis assays. J Chem Inf Model. 2019;59:4450–4459. doi: 10.1021/acs.jcim.9b00375. PubMed DOI
Briem H, Lessel UF (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. In: Perspectives in drug discovery and design. Kluwer Academic Publishers, New York, pp 231–244
Martin EJ, Polyakov VR, Tian L, Perez RC. Profile-QSAR 2.0: kinase virtual screening accuracy comparable to four-concentration IC50s for realistically novel compounds. J Chem Inf Model. 2017;57:2077–2088. doi: 10.1021/acs.jcim.7b00166. PubMed DOI
Nidhi Glick M, Davies JW, Jenkins JL. Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J Chem Inf Model. 2006;46:1124–1133. doi: 10.1021/ci060003g. PubMed DOI
Lessel UF, Briem H. Flexsim-X: a method for the detection of molecules with similar biological activity. J Chem Inf Comput Sci. 2002;40:246–253. doi: 10.1021/ci990439e. PubMed DOI
Koutsoukas A, Lowe R, KalantarMotamedi Y, et al. In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt Window. J Chem Inf Model. 2013;53:1957–1966. doi: 10.1021/ci300435j. PubMed DOI
Koutsoukas A, Simms B, Kirchmair J, et al. From in silico target prediction to multi-target drug design: current databases, methods and applications. J Proteomics. 2011;74:2554–2574. doi: 10.1016/j.jprot.2011.05.011. PubMed DOI
Lounkine E, Keiser MJ, Whitebread S, et al. Large-scale prediction and testing of drug activity on side-effect targets. Nature. 2012;486:361–367. doi: 10.1038/nature11159. PubMed DOI PMC
Cheng T, Li Q, Wang Y, Bryant SH. Identifying compound-target associations by combining bioactivity profile similarity search and public databases mining. J Chem Inf Model. 2011;51:2440–2448. doi: 10.1021/ci200192v. PubMed DOI PMC
Peragovics Á, Simon Z, Brandhuber I, et al. Contribution of 2D and 3D structural features of drug molecules in the prediction of drug profile matching. J Chem Inf Model. 2012;52:1733–1744. doi: 10.1021/ci3001056. PubMed DOI
Peragovics Á, Simon Z, Tombor L, et al. Virtual affinity fingerprints for target fishing: a new application of drug profile matching. J Chem Inf Model. 2013;53:103–113. doi: 10.1021/ci3004489. PubMed DOI
Simon Z, Peragovics Á, Vigh-Smeller M, et al. Drug effect prediction by polypharmacology-based interaction profiling. J Chem Inf Model. 2012;52:134–145. doi: 10.1021/ci2002022. PubMed DOI
Poroikov V, Filimonov D, Lagunin A, et al. PASS: identification of probable targets and mechanisms of toxicity. SAR QSAR Environ Res. 2007;18:101–110. doi: 10.1080/10629360601054032. PubMed DOI
Fliri AF, Loging WT, Thadeio PF, Volkmann RA. Biological spectra analysis: linking biological activity profiles to molecular structure. Proc Natl Acad Sci USA. 2005;102:261–266. doi: 10.1073/pnas.0407790101. PubMed DOI PMC
Martin E, Mukherjee P, Sullivan D, Jansen J. Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. J Chem Inf Model. 2011;51:1942–1956. doi: 10.1021/ci1005004. PubMed DOI
Bender A, Jenkins JL, Glick M, et al. “Bayes affinity fingerprints” Improve retrieval rates in virtual screening and define orthogonal bioactivity space: when are multitarget drugs a feasible concept? J Chem Inf Model. 2006;46:2445–2456. doi: 10.1021/ci600197y. PubMed DOI
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform. 2020;12:39. doi: 10.1186/s13321-020-00443-6. PubMed DOI PMC
Huang R, Xia M, Sakamuru S, et al. Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nat Commun. 2016;7:1–10. doi: 10.1038/ncomms10425. PubMed DOI PMC
Shoemaker RH. The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer. 2006;6:813–823. doi: 10.1038/nrc1951. PubMed DOI
Cherkasov A, Muratov EN, Fourches D, et al. QSAR modeling: where have you been? Where are you going to? J Med Chem. 2014;57:4977–5010. doi: 10.1021/jm4004285. PubMed DOI PMC
Barretina J, Caponigro G, Stransky N, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. PubMed DOI PMC
de Waal L, Lewis TA, Rees MG, et al. Identification of cancer-cytotoxic modulators of PDE3A by predictive chemogenomics. Nat Chem Biol. 2016;12:102–108. doi: 10.1038/nchembio.1984. PubMed DOI PMC
Geeleher P, Cox NJ, Huang RS. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 2014;15:R47. doi: 10.1186/gb-2014-15-3-r47. PubMed DOI PMC
Netzeva TI, Worth A, Aldenberg T, et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern Lab Anim. 2005;33:155–173. doi: 10.1177/026119290503300209. PubMed DOI
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50:742–754. doi: 10.1021/ci100050t. PubMed DOI
Nowotka M, Papadatos G, Davies M, et al Want Drugs? Use Python. 2016, arXiv160700378 arXiv.org ePrint Arch. https://arxiv.org/abs/160700378. Accessed 10 July 2018
Davies M, Nowotka M, Papadatos G, et al. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 2015;43:W612–W620. doi: 10.1093/nar/gkv352. PubMed DOI PMC
Gaulton A, Bellis LJ, Bento AP, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2011;40:1100–1107. doi: 10.1093/nar/gkr777. PubMed DOI PMC
Cortés-Ciriano I, Bender A. How consistent are publicly reported cytotoxicity data? Large-scale statistical analysis of the concordance of public independent cytotoxicity measurements. ChemMedChem. 2015;11:57–71. doi: 10.1002/cmdc.201500424. PubMed DOI
Cortés-Ciriano I, Bender A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform. 2019;11:41. doi: 10.1186/s13321-019-0364-5. PubMed DOI PMC
Cortés-Ciriano I, Bender A. Reliable prediction errors for deep neural networks using test-time dropout. J Chem Inf Model. 2019;59:3330–3339. doi: 10.1021/acs.jcim.9b00297. PubMed DOI
Cortés-Ciriano I, Bender A. Deep confidence: a computationally efficient framework for calculating reliable prediction errors for deep neural networks. J Chem Inf Model. 2019;59:1269–1281. doi: 10.1021/acs.jcim.8b00542. PubMed DOI
Fourches D, Muratov E, Tropsha A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model. 2010;50:1189–1204. doi: 10.1021/ci100176x. PubMed DOI PMC
O’Boyle NM, Sayle RA. Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform. 2016;8:36. doi: 10.1186/s13321-016-0148-0. PubMed DOI PMC
Roy K, Kar S, Das RN. Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment. Cham: Springer; 2015. Selected statistical methods in QSAR; pp. 191–229.
Norinder U, Carlsson L, Boyer S, et al. Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. J Chem Inf Model. 2014;54:1596–1603. doi: 10.1021/ci5001168. PubMed DOI
Landrum G RDKit: open-source cheminformatics. https://www.rdkit.org/. Accessed 12 Jan 2017
Bender A, Jenkins JL, Scheiber J, et al. How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J Chem Inf Model. 2009;49:108–119. doi: 10.1021/ci800249s. PubMed DOI
Koutsoukas A, Paricharak S, Galloway WRJD, et al. How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. J Chem Inf Model. 2013;54:230–242. doi: 10.1021/ci400469u. PubMed DOI
Jones E, Oliphant E, Peterson P et al (2001) SciPy: open source scientific tools for python. http://www.scipy.org/
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–2830.
Sheridan RP. Using random forest to model the domain applicability of another random forest model. J Chem Inf Model. 2013;53:2837–2850. doi: 10.1021/ci400482e. PubMed DOI
Sheridan RP. Three useful dimensions for domain applicability in QSAR models using random forest. J Chem Inf Model. 2012;52:814–823. doi: 10.1021/ci300004n. PubMed DOI
Cortés-Ciriano I, van Westen GJP, Bouvier G, et al. Improved large-scale prediction of growth inhibition patterns on the NCI60 cancer cell-line panel. Bioinformatics. 2016;32:85–95. doi: 10.1093/bioinformatics/btv529. PubMed DOI PMC
Winer B, Brown D, Michels K. Statistical principles in experimental design. 3. New York: McGraw-Hill; 1991.
Kosub S. A note on the triangle inequality for the Jaccard distance. Pattern Recognit Lett. 2019;120:36–38. doi: 10.1016/j.patrec.2018.12.007. DOI
Patterson DE, Cramer RD, Ferguson AM, et al. Neighborhood behavior: a useful concept for validation of “molecular diversity” descriptors. J Med Chem. 1996;39:3049–3059. doi: 10.1021/jm960290n. PubMed DOI
Kalliokoski T, Kramer C, Vulpetti A, Gedeck P. Comparability of mixed IC50 data—a statistical analysis. PLoS ONE. 2013;8:e61007. doi: 10.1371/journal.pone.0061007. PubMed DOI PMC
Rücker C, Rücker G, Meringer M. y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model. 2007 doi: 10.1021/CI700157B. PubMed DOI
Cortés-Ciriano I, Bender A, Malliavin TE, et al. Comparing the influence of simulated experimental errors on 12 machine learning algorithms in bioactivity modeling using 12 diverse data sets. J Chem Inf Model. 2015;55:1413–1425. doi: 10.1021/acs.jcim.5b00101. PubMed DOI
Cortés-Ciriano I, Bender A. Improved chemical structure–activity modeling through data augmentation. J Chem Inf Model. 2015;55:2682–2692. doi: 10.1021/acs.jcim.5b00570. PubMed DOI
Kuz’min VE, Polishchuk PG, Artemenko AG, Andronati SA. Interpretation of QSAR models based on random forest methods. Mol Inform. 2011;30:593–603. doi: 10.1002/minf.201000173. PubMed DOI
Safikhani Z, Freeman M, Smirnov P, et al. Revisiting inconsistency in large pharmacogenomic studies. F1000Research. 2017;5:2333. doi: 10.12688/f1000research.9611.3. PubMed DOI PMC
Haibe-Kains B, El-Hachem N, Birkbak NJ, et al. Inconsistency in large pharmacogenomic studies. Nature. 2013;504:389–393. doi: 10.1038/nature12831. PubMed DOI PMC
Fallahi-Sichani M, Honarnejad S, Heiser LM, et al. Metrics other than potency reveal systematic variation in responses to cancer drugs. Nat Chem Biol. 2013;9:708–714. doi: 10.1038/nchembio.1337. PubMed DOI PMC
Hafner M, Niepel M, Chung M, Sorger PK. Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs. Nat Meth. 2016;13:521–527. doi: 10.1038/nmeth.3853. PubMed DOI PMC
Consortium TG of DS in CCLE. Consortium TG of DS in CCLE. Stransky N, et al. Pharmacogenomic agreement between two cancer cell line data sets. Nature. 2015;528:84–87. doi: 10.1038/nature15736. PubMed DOI PMC
Módos D, Bulusu KC, Fazekas D, et al. Neighbours of cancer-related proteins have key influence on pathogenesis and could increase the drug target space for anticancer therapies. NPJ Syst Biol Appl. 2017;3:2. doi: 10.1038/s41540-017-0003-6. PubMed DOI PMC
Garnett MMJ, Edelman EEJ, Heidorn SJS, et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature. 2012;483:570–575. doi: 10.1038/nature11005. PubMed DOI PMC
Rodríguez-Antona C, Taron M. Pharmacogenomic biomarkers for personalized cancer treatment. J Intern Med. 2015;277:201–217. doi: 10.1111/joim.12321. PubMed DOI
Konecny GE, Kristeleit RS. PARP inhibitors for BRCA1/2-mutated and sporadic ovarian cancer: current practice and future directions. Br J Cancer. 2016;115:1157–1173. doi: 10.1038/bjc.2016.311. PubMed DOI PMC
Bitler BG, Watson ZL, Wheeler LJ, Behbakht K. PARP inhibitors: clinical utility and possibilities of overcoming resistance. Gynecol Oncol. 2017;147:695–704. doi: 10.1016/J.YGYNO.2017.10.003. PubMed DOI PMC
Underhill C, Toulmonde M, Bonnefoi H. A review of PARP inhibitors: from bench to bedside. Ann Oncol. 2011;22:268–279. doi: 10.1093/annonc/mdq322. PubMed DOI
Curtin N. PARP inhibitors for anticancer therapy. Biochem Soc Trans. 2014;42:82–88. doi: 10.1042/BST20130187. PubMed DOI
Nguyen L, Naulaerts S, Bomane A, et al (2018) Machine learning models to predict in vivo drug response via optimal dimensionality reduction of tumour molecular profiles. bioRxiv 277772. 10.1101/277772
Gulhan DC, Lee JJ-K, Melloni GEM, et al. Detecting the mutational signature of homologous recombination deficiency in clinical samples. Nat Genet. 2019;51:912–919. doi: 10.1038/s41588-019-0390-2. PubMed DOI
Dry JR, Yang M, Saez-Rodriguez J. Looking beyond the cancer cell for effective drug combinations. Genome Med. 2016;8:125. doi: 10.1186/s13073-016-0379-8. PubMed DOI PMC
Bulusu KC, Guha R, Mason DJ, et al. Modelling of compound combination effects and applications to efficacy and toxicity: state-of-the-art, challenges and perspectives. Drug Discov Today. 2015;21:225–238. doi: 10.1016/j.drudis.2015.09.003. PubMed DOI
Sidorov P, Naulaerts S, Ariey-Bonnet J, et al (2018) Predicting synergism of cancer drug combinations using NCI-ALMANAC data. bioRxiv 504076. 10.1101/504076 PubMed PMC
Menden MP, Wang D, Mason MJ, et al. Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen. Nat Commun. 2019;10:2674. doi: 10.1038/s41467-019-09799-2. PubMed DOI PMC
Profiling and analysis of chemical compounds using pointwise mutual information