Benchmarks for interpretation of QSAR models
Status PubMed-not-MEDLINE Jazyk angličtina Země Velká Británie, Anglie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
CZ.02.1.01/0.0/0.0/16_019/0000868
European Regional Development Fund
LM2018131
Ministerstvo Školství, Mládeže a Tělovýchovy
TN01000013
Technology Agency of the Czech Republic
PubMed
34039411
PubMed Central
PMC8157407
DOI
10.1186/s13321-021-00519-x
PII: 10.1186/s13321-021-00519-x
Knihovny.cz E-zdroje
- Klíčová slova
- Atom contributions, Benchmark data set, Graph convolutional neural networks, Interpretability metrics, QSAR model interpretation, Synthetic data set,
- Publikační typ
- časopisecké články MeSH
Interpretation of QSAR models is useful to understand the complex nature of biological or physicochemical processes, guide structural optimization or perform knowledge-based validation of QSAR models. Highly predictive models are usually complex and their interpretation is non-trivial. This is particularly true for modern neural networks. Various approaches to interpretation of these models exist. However, it is difficult to evaluate and compare performance and applicability of these ever-emerging methods. Herein, we developed several benchmark data sets with end-points determined by pre-defined patterns. These data sets are purposed for evaluation of the ability of interpretation approaches to retrieve these patterns. They represent tasks with different complexity levels: from simple atom-based additive properties to pharmacophore hypothesis. We proposed several quantitative metrics of interpretation performance. Applicability of benchmarks and metrics was demonstrated on a set of conventional models and end-to-end graph convolutional neural networks, interpreted by the previously suggested universal ML-agnostic approach for structural interpretation. We anticipate these benchmarks to be useful in evaluation of new interpretation approaches and investigation of decision making of complex "black box" models.
Zobrazit více v PubMed
Polishchuk P. Interpretation of quantitative structure-activity relationship models: past, present, and future. J Chem Inf Model. 2017;57:2618–2639. doi: 10.1021/acs.jcim.7b00274. PubMed DOI
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev. 2019;119:10520–10594. doi: 10.1021/acs.chemrev.8b00728. PubMed DOI
Lapuschkin S, Waldchen S, Binder A, Montavon G, Samek W, Muller K. Unmasking Clever Hans predictors and assessing what machines really learn. Nat Commun. 2019 doi: 10.1038/s41467-019-08987-4. PubMed DOI PMC
Jimenez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. Nature Mach Intell. 2020;2:573–584. doi: 10.1038/s42256-020-00236-4. DOI
Bach S, Binder A, Montavon G, Klauschen F, Muller K, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE. 2015 doi: 10.1371/journal.pone.0130140. PubMed DOI PMC
Shrikumar A, Greenside P, Kundaje A (2019) Learning important features through propagating activation differences. arXiv: 1704.02685
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. 2016 IEEE conference on computer vision and pattern recognition (Cvpr) 2921–2929. 10.1109/CVPR.2016.319
Selvaraju R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. 2017 IEEE international conference on computer vision (Iccv) 618–626. 10.1109/ICCV.2017.74
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXivpreprint arXiv: 1706.03762
McCloskey K, Taly A, Monti F, Brenner M, Colwell L. Using attribution to decode binding mechanism in neural network models for chemistry. Proc Natl Acad Sci USA. 2019;116:11624–11629. doi: 10.1073/pnas.1820657116. PubMed DOI PMC
Strumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst. 2014;41:647–665. doi: 10.1007/s10115-013-0679-x. DOI
Rodriguez-Perez R, Bajorath J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem. 2020;63:8761–8777. doi: 10.1021/acs.jmedchem.9b01101. PubMed DOI
Webel H, Kimber T, Radetzki S, Neuenschwander M, Nazare M, Volkamer A. Revealing cytotoxic substructures in molecules using deep learning. J Comput Aided Mol Des. 2020;34:731–746. doi: 10.1007/s10822-020-00310-4. PubMed DOI PMC
Polishchuk PG, Kuz'min VE, Artemenko AG, Muratov EN. Universal approach for structural interpretation of QSAR/QSPR models. Mol Inf. 2013;32:843–853. doi: 10.1002/minf.201300029. PubMed DOI
Tang B, Kramer S, Fang M, Qiu Y, Wu Z, Xu D. A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J Cheminform. 2020 doi: 10.1186/s13321-020-0414-z. PubMed DOI PMC
Sheridan R. Interpretation of QSAR models by coloring atoms according to changes in predicted activity: how robust is it? J Chem Inf Model. 2019;59:1324–1337. doi: 10.1021/acs.jcim.8b00825. PubMed DOI
Riniker S, Landrum GA. Similarity maps—a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform. 2013;5:43. doi: 10.1186/1758-2946-5-43. PubMed DOI PMC
Sanchez-Lengeling B, Wei J, Lee B, Reif E, Wang P, Qian W, McCloskey K, Colwell L, Wiltschko A. Evaluating attribution for graph neural networks. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2020. pp. 5898–5910.
Viswanadhan VN, Ghose AK, Revankar GR, Robins RK. Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J Chem Inform Comput Sci. 1989;29:163–172. doi: 10.1021/ci00063a006. DOI
Cheng T, Zhao Y, Li X, Lin F, Xu Y, Zhang X, Li Y, Wang R, Lai L. Computation of octanol−water partition coefficients by guiding an additive model with knowledge. J Chem Inf Model. 2007;47:2140–2148. doi: 10.1021/ci700257y. PubMed DOI
Ramsundar B (2018) Molecular machine learning with DeepChem. Abstracts of Papers of the American Chemical Society, 255
Kutlushina A, Khakimova A, Madzhidov T, Polishchuk P (2019) Kutlushina, A., et al. Ligand-Based Pharmacophore Modeling Using Novel 3D Pharmacophore Signatures (vol 23, pg 3094, 2018). Molecules 10.3390/molecules24061052 PubMed PMC
Polishchuk P. CReM: chemically reasonable mutations framework for structure generation. J Cheminform. 2020 doi: 10.1186/s13321-020-00431-w. PubMed DOI PMC
Polishchuk P. Control of synthetic feasibility of compounds generated with CReM. J Chem Inf Model. 2020;60:6074–6080. doi: 10.1021/acs.jcim.0c00792. PubMed DOI
RDKit: Open-source cheminformatics Software 2019.09.3 (2019). http://www.rdkit.org
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in python. JMLR. 2011;12:2825–2830.
Polishchuk P, Tinkov O, Khristova T, Ognichenko L, Kosinskaya A, Varnek A, Kuz’min V, Structural and physico-chemical interpretation (spci) of qsar models and its comparison with matched molecular pair analysis. J Chem Inf Model. 2016;56:1455–1469. doi: 10.1021/acs.jcim.6b00371. PubMed DOI
Golbraikh A, Muratov E, Fourches D, Tropsha A. Data set modelability by QSAR. J Chem Inf Model. 2014;54:1–4. doi: 10.1021/ci400572x. PubMed DOI PMC
DeepChem 2.5.0. https://github.com/deepchem/deepchem
Robinson M, Glen R, Lee A. Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction. J Comput Aided Mol Des. 2020;34:717–730. doi: 10.1007/s10822-019-00274-0. PubMed DOI PMC
Finkelmann A, Goldmann D, Schneider G, Goller A. MetScore: site of metabolism prediction beyond cytochrome P450 enzymes. ChemMedChem. 2018;13:2281–2289. doi: 10.1002/cmdc.201800309. PubMed DOI