Benchmarks for interpretation of QSAR models
Status PubMed-not-MEDLINE Language English Country Great Britain, England Media electronic
Document type Journal Article
Grant support
CZ.02.1.01/0.0/0.0/16_019/0000868
European Regional Development Fund
LM2018131
Ministerstvo Školství, Mládeže a Tělovýchovy
TN01000013
Technology Agency of the Czech Republic
PubMed
34039411
PubMed Central
PMC8157407
DOI
10.1186/s13321-021-00519-x
PII: 10.1186/s13321-021-00519-x
Knihovny.cz E-resources
- Keywords
- Atom contributions, Benchmark data set, Graph convolutional neural networks, Interpretability metrics, QSAR model interpretation, Synthetic data set,
- Publication type
- Journal Article MeSH
Interpretation of QSAR models is useful to understand the complex nature of biological or physicochemical processes, guide structural optimization or perform knowledge-based validation of QSAR models. Highly predictive models are usually complex and their interpretation is non-trivial. This is particularly true for modern neural networks. Various approaches to interpretation of these models exist. However, it is difficult to evaluate and compare performance and applicability of these ever-emerging methods. Herein, we developed several benchmark data sets with end-points determined by pre-defined patterns. These data sets are purposed for evaluation of the ability of interpretation approaches to retrieve these patterns. They represent tasks with different complexity levels: from simple atom-based additive properties to pharmacophore hypothesis. We proposed several quantitative metrics of interpretation performance. Applicability of benchmarks and metrics was demonstrated on a set of conventional models and end-to-end graph convolutional neural networks, interpreted by the previously suggested universal ML-agnostic approach for structural interpretation. We anticipate these benchmarks to be useful in evaluation of new interpretation approaches and investigation of decision making of complex "black box" models.
See more in PubMed
Polishchuk P. Interpretation of quantitative structure-activity relationship models: past, present, and future. J Chem Inf Model. 2017;57:2618–2639. doi: 10.1021/acs.jcim.7b00274. PubMed DOI
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev. 2019;119:10520–10594. doi: 10.1021/acs.chemrev.8b00728. PubMed DOI
Lapuschkin S, Waldchen S, Binder A, Montavon G, Samek W, Muller K. Unmasking Clever Hans predictors and assessing what machines really learn. Nat Commun. 2019 doi: 10.1038/s41467-019-08987-4. PubMed DOI PMC
Jimenez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. Nature Mach Intell. 2020;2:573–584. doi: 10.1038/s42256-020-00236-4. DOI
Bach S, Binder A, Montavon G, Klauschen F, Muller K, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE. 2015 doi: 10.1371/journal.pone.0130140. PubMed DOI PMC
Shrikumar A, Greenside P, Kundaje A (2019) Learning important features through propagating activation differences. arXiv: 1704.02685
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. 2016 IEEE conference on computer vision and pattern recognition (Cvpr) 2921–2929. 10.1109/CVPR.2016.319
Selvaraju R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. 2017 IEEE international conference on computer vision (Iccv) 618–626. 10.1109/ICCV.2017.74
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXivpreprint arXiv: 1706.03762
McCloskey K, Taly A, Monti F, Brenner M, Colwell L. Using attribution to decode binding mechanism in neural network models for chemistry. Proc Natl Acad Sci USA. 2019;116:11624–11629. doi: 10.1073/pnas.1820657116. PubMed DOI PMC
Strumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst. 2014;41:647–665. doi: 10.1007/s10115-013-0679-x. DOI
Rodriguez-Perez R, Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem. 2020;63:8761–8777. doi: 10.1021/acs.jmedchem.9b01101. PubMed DOI
Webel H, Kimber T, Radetzki S, Neuenschwander M, Nazare M, Volkamer A. Revealing cytotoxic substructures in molecules using deep learning. J Comput Aided Mol Des. 2020;34:731–746. doi: 10.1007/s10822-020-00310-4. PubMed DOI PMC
Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal approach for structural interpretation of QSAR/QSPR models. Mol Inf. 2013;32:843–853. doi: 10.1002/minf.201300029. PubMed DOI
Tang B, Kramer S, Fang M, Qiu Y, Wu Z, Xu D. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J Cheminform. 2020 doi: 10.1186/s13321-020-0414-z. PubMed DOI PMC
Sheridan R. Interpretation of QSAR models by coloring atoms according to changes in predicted activity: how robust is it? J Chem Inf Model. 2019;59:1324–1337. doi: 10.1021/acs.jcim.8b00825. PubMed DOI
Riniker S, Landrum GA. Similarity maps—a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform. 2013;5:43. doi: 10.1186/1758-2946-5-43. PubMed DOI PMC
Sanchez-Lengeling B, Wei J, Lee B, Reif E, Wang P, Qian W, McCloskey K, Colwell L, Wiltschko A. Evaluating attribution for graph neural networks. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2020. pp. 5898–5910.
Viswanadhan VN, Ghose AK, Revankar GR, Robins RK. Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J Chem Inform Comput Sci. 1989;29:163–172. doi: 10.1021/ci00063a006. DOI
Cheng T, Zhao Y, Li X, Lin F, Xu Y, Zhang X, Li Y, Wang R, Lai L. Computation of octanol−water partition coefficients by guiding an additive model with knowledge. J Chem Inf Model. 2007;47:2140–2148. doi: 10.1021/ci700257y. PubMed DOI
Ramsundar B (2018) Molecular machine learning with DeepChem. Abstracts of Papers of the American Chemical Society, 255
Kutlushina A, Khakimova A, Madzhidov T, Polishchuk P (2019) Kutlushina, A., et al. Ligand-Based Pharmacophore Modeling Using Novel 3D Pharmacophore Signatures (vol 23, pg 3094, 2018). Molecules 10.3390/molecules24061052 PubMed PMC
Polishchuk P. CReM: chemically reasonable mutations framework for structure generation. J Cheminform. 2020 doi: 10.1186/s13321-020-00431-w. PubMed DOI PMC
Polishchuk P. Control of synthetic feasibility of compounds generated with CReM. J Chem Inf Model. 2020;60:6074–6080. doi: 10.1021/acs.jcim.0c00792. PubMed DOI
RDKit: Open-source cheminformatics Software 2019.09.3 (2019). http://www.rdkit.org
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in python. JMLR. 2011;12:2825–2830.
Polishchuk P, Tinkov O, Khristova T, Ognichenko L, Kosinskaya A, Varnek A, Kuz’min V, Structural and physico-chemical interpretation (spci) of qsar models and its comparison with matched molecular pair analysis. J Chem Inf Model. 2016;56:1455–1469. doi: 10.1021/acs.jcim.6b00371. PubMed DOI
Golbraikh A, Muratov E, Fourches D, Tropsha A. Data set modelability by QSAR. J Chem Inf Model. 2014;54:1–4. doi: 10.1021/ci400572x. PubMed DOI PMC
DeepChem 2.5.0. https://github.com/deepchem/deepchem
Robinson M, Glen R, Lee A. Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction. J Comput Aided Mol Des. 2020;34:717–730. doi: 10.1007/s10822-019-00274-0. PubMed DOI PMC
Finkelmann A, Goldmann D, Schneider G, Goller A. MetScore: site of metabolism prediction beyond cytochrome P450 enzymes. ChemMedChem. 2018;13:2281–2289. doi: 10.1002/cmdc.201800309. PubMed DOI