JavaScript is NOT enabled !

Please enable JavaScript.

Article

FT
PubMed

This record comes from PubMed

Benchmarks for interpretation of QSAR models

Matveieva, Mariia
Author Matveieva, Mariia ORCID Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
Polishchuk, Pavel
Author Polishchuk, Pavel ORCID Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University, University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic. pavlo.polishchuk@upol.cz

Journal of cheminformatics. 2021 May 26 ; 13 (1) : 41. [epub] 20210526

J Cheminform
ISSN 1758-2946
Source

Status PubMed-not-MEDLINE Language English Country Great Britain, England Media electronic

Document type Journal Article

Persistent link https://www.medvik.cz/link/pmid34039411

Grant support
CZ.02.1.01/0.0/0.0/16_019/0000868 European Regional Development Fund
LM2018131 Ministerstvo Školství, Mládeže a Tělovýchovy
TN01000013 Technology Agency of the Czech Republic

Online Full text

PubMed 34039411
PubMed Central PMC8157407
DOI 10.1186/s13321-021-00519-x
PII: 10.1186/s13321-021-00519-x
Knihovny.cz E-resources

Keywords
Atom contributions, Benchmark data set, Graph convolutional neural networks, Interpretability metrics, QSAR model interpretation, Synthetic data set,
Publication type
Journal Article MeSH

Interpretation of QSAR models is useful to understand the complex nature of biological or physicochemical processes, guide structural optimization or perform knowledge-based validation of QSAR models. Highly predictive models are usually complex and their interpretation is non-trivial. This is particularly true for modern neural networks. Various approaches to interpretation of these models exist. However, it is difficult to evaluate and compare performance and applicability of these ever-emerging methods. Herein, we developed several benchmark data sets with end-points determined by pre-defined patterns. These data sets are purposed for evaluation of the ability of interpretation approaches to retrieve these patterns. They represent tasks with different complexity levels: from simple atom-based additive properties to pharmacophore hypothesis. We proposed several quantitative metrics of interpretation performance. Applicability of benchmarks and metrics was demonstrated on a set of conventional models and end-to-end graph convolutional neural networks, interpreted by the previously suggested universal ML-agnostic approach for structural interpretation. We anticipate these benchmarks to be useful in evaluation of new interpretation approaches and investigation of decision making of complex "black box" models.

Institute of Molecular and Translational Medicine Faculty of Medicine and Dentistry Palacky University University Hospital in Olomouc Hnevotinska 5 77900 Olomouc Czech Republic

See more in PubMed

Polishchuk P. Interpretation of quantitative structure-activity relationship models: past, present, and future. J Chem Inf Model. 2017;57:2618–2639. doi: 10.1021/acs.jcim.7b00274. PubMed DOI

Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev. 2019;119:10520–10594. doi: 10.1021/acs.chemrev.8b00728. PubMed DOI

Lapuschkin S, Waldchen S, Binder A, Montavon G, Samek W, Muller K. Unmasking Clever Hans predictors and assessing what machines really learn. Nat Commun. 2019 doi: 10.1038/s41467-019-08987-4. PubMed DOI PMC

Jimenez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. Nature Mach Intell. 2020;2:573–584. doi: 10.1038/s42256-020-00236-4. DOI

Bach S, Binder A, Montavon G, Klauschen F, Muller K, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE. 2015 doi: 10.1371/journal.pone.0130140. PubMed DOI PMC

Shrikumar A, Greenside P, Kundaje A (2019) Learning important features through propagating activation differences. arXiv: 1704.02685

Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. 2016 IEEE conference on computer vision and pattern recognition (Cvpr) 2921–2929. 10.1109/CVPR.2016.319

Selvaraju R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. 2017 IEEE international conference on computer vision (Iccv) 618–626. 10.1109/ICCV.2017.74

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXivpreprint arXiv: 1706.03762

McCloskey K, Taly A, Monti F, Brenner M, Colwell L. Using attribution to decode binding mechanism in neural network models for chemistry. Proc Natl Acad Sci USA. 2019;116:11624–11629. doi: 10.1073/pnas.1820657116. PubMed DOI PMC

Strumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst. 2014;41:647–665. doi: 10.1007/s10115-013-0679-x. DOI

Rodriguez-Perez R, Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem. 2020;63:8761–8777. doi: 10.1021/acs.jmedchem.9b01101. PubMed DOI

Webel H, Kimber T, Radetzki S, Neuenschwander M, Nazare M, Volkamer A. Revealing cytotoxic substructures in molecules using deep learning. J Comput Aided Mol Des. 2020;34:731–746. doi: 10.1007/s10822-020-00310-4. PubMed DOI PMC

Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal approach for structural interpretation of QSAR/QSPR models. Mol Inf. 2013;32:843–853. doi: 10.1002/minf.201300029. PubMed DOI

Tang B, Kramer S, Fang M, Qiu Y, Wu Z, Xu D. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J Cheminform. 2020 doi: 10.1186/s13321-020-0414-z. PubMed DOI PMC

Sheridan R. Interpretation of QSAR models by coloring atoms according to changes in predicted activity: how robust is it? J Chem Inf Model. 2019;59:1324–1337. doi: 10.1021/acs.jcim.8b00825. PubMed DOI

Riniker S, Landrum GA. Similarity maps—a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform. 2013;5:43. doi: 10.1186/1758-2946-5-43. PubMed DOI PMC

Sanchez-Lengeling B, Wei J, Lee B, Reif E, Wang P, Qian W, McCloskey K, Colwell L, Wiltschko A. Evaluating attribution for graph neural networks. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2020. pp. 5898–5910.

Viswanadhan VN, Ghose AK, Revankar GR, Robins RK. Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J Chem Inform Comput Sci. 1989;29:163–172. doi: 10.1021/ci00063a006. DOI

Cheng T, Zhao Y, Li X, Lin F, Xu Y, Zhang X, Li Y, Wang R, Lai L. Computation of octanol−water partition coefficients by guiding an additive model with knowledge. J Chem Inf Model. 2007;47:2140–2148. doi: 10.1021/ci700257y. PubMed DOI

Ramsundar B (2018) Molecular machine learning with DeepChem. Abstracts of Papers of the American Chemical Society, 255

Kutlushina A, Khakimova A, Madzhidov T, Polishchuk P (2019) Kutlushina, A., et al. Ligand-Based Pharmacophore Modeling Using Novel 3D Pharmacophore Signatures (vol 23, pg 3094, 2018). Molecules 10.3390/molecules24061052 PubMed PMC

Polishchuk P. CReM: chemically reasonable mutations framework for structure generation. J Cheminform. 2020 doi: 10.1186/s13321-020-00431-w. PubMed DOI PMC

Polishchuk P. Control of synthetic feasibility of compounds generated with CReM. J Chem Inf Model. 2020;60:6074–6080. doi: 10.1021/acs.jcim.0c00792. PubMed DOI

RDKit: Open-source cheminformatics Software 2019.09.3 (2019). http://www.rdkit.org

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in python. JMLR. 2011;12:2825–2830.

Polishchuk P, Tinkov O, Khristova T, Ognichenko L, Kosinskaya A, Varnek A, Kuz’min V, Structural and physico-chemical interpretation (spci) of qsar models and its comparison with matched molecular pair analysis. J Chem Inf Model. 2016;56:1455–1469. doi: 10.1021/acs.jcim.6b00371. PubMed DOI

Golbraikh A, Muratov E, Fourches D, Tropsha A. Data set modelability by QSAR. J Chem Inf Model. 2014;54:1–4. doi: 10.1021/ci400572x. PubMed DOI PMC

DeepChem 2.5.0. https://github.com/deepchem/deepchem

Robinson M, Glen R, Lee A. Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction. J Comput Aided Mol Des. 2020;34:717–730. doi: 10.1007/s10822-019-00274-0. PubMed DOI PMC

Finkelmann A, Goldmann D, Schneider G, Goller A. MetScore: site of metabolism prediction beyond cytochrome P450 enzymes. ChemMedChem. 2018;13:2281–2289. doi: 10.1002/cmdc.201800309. PubMed DOI

Borrow
RIS

Find record

In BMC

Benchmarks for interpretation of QSAR models

Find record

Citation metrics

Archiving options