Nonpher: computational method for design of hard-to-synthesize structures

. 2017 Mar 20 ; 9 (1) : 20. [epub] 20170320

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid29086122
Odkazy

PubMed 29086122
PubMed Central PMC5359269
DOI 10.1186/s13321-017-0206-2
PII: 10.1186/s13321-017-0206-2
Knihovny.cz E-zdroje

In cheminformatics, machine learning methods are typically used to classify chemical compounds into distinctive classes such as active/nonactive or toxic/nontoxic. To train a classifier, a training data set must consist of examples from both positive and negative classes. While a biological activity or toxicity can be experimentally measured, another important molecular property, a synthetic feasibility, is a more abstract feature that can't be easily assessed. In the present paper, we introduce Nonpher, a computational method for the construction of a hard-to-synthesize virtual library. Nonpher is based on a molecular morphing algorithm in which new structures are iteratively generated by simple structural changes, such as the addition or removal of an atom or a bond. In Nonpher, molecular morphing was optimized so that it yields structures not overly complex, but just right hard-to-synthesize. Nonpher results were compared with SAscore and dense region (DR), other two methods for the generation of hard-to-synthesize compounds. Random forest classifier trained on Nonpher data achieves better results than models obtained using SAscore and DR data.

Zobrazit více v PubMed

Klebe G. Virtual ligand screening: strategies, perspectives and limitations. Drug Discov Today. 2006;11(13–14):580–594. doi: 10.1016/j.drudis.2006.05.012. PubMed DOI PMC

Shoichet BK. Virtual screening of chemical libraries. Nature. 2004;432(7019):862–865. doi: 10.1038/nature03197. PubMed DOI PMC

Hartenfeller M, Schneider G. De Novo Drug Design. In: Bajorath J, editor. Chemoinformatics and computational chemical biology. Totowa: Humana Press; 2011. pp. 299–323.

Bonnet P. Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. Eur J Med Chem. 2012;54:679–689. doi: 10.1016/j.ejmech.2012.06.024. PubMed DOI

Lajiness MS, Maggiora GM, Shanmugasundaram V. Assessment of the consistency of medicinal chemists in reviewing sets of compounds. J Med Chem. 2004;47(20):4891–4896. doi: 10.1021/jm049740z. PubMed DOI

Takaoka Y, Endo Y, Yamanobe S, Kakinuma H, Okubo T, Shimazaki Y, Ota T, Sumiya S, Yoshikawa K. Development of a method for evaluating drug-likeness and ease of synthesis using a data set in which compounds are assigned scores based on chemists’ intuition. J Chem Inf Comput Sci. 2003;43(4):1269–1275. doi: 10.1021/ci034043l. PubMed DOI

Kutchukian PS, Vasilyeva NY, Xu J, Lindvall MK, Dillon MP, Glick M, Coley JD, Brooijmans N. Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery. PLoS ONE. 2012;7(11):e48476. doi: 10.1371/journal.pone.0048476. PubMed DOI PMC

Baber JC, Feher M. Predicting synthetic accessibility: application in drug discovery and development. Mini Rev Med Chem. 2004;4(6):681–692. doi: 10.2174/1389557043403765. PubMed DOI

Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform. 2009;1:11. doi: 10.1186/1758-2946-1-8. PubMed DOI PMC

Ihlenfeldt W-D, Gasteiger J. Computer-assisted planning of organic syntheses: the second generation of programs. Angew Chem Int Ed Engl. 1996;34(23–24):2613–2633. doi: 10.1002/anie.199526131. DOI

Huang Q, Li L-L, Yang S-Y. RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules. J Chem Inf Model. 2011;51(10):2768–2777. doi: 10.1021/ci100216g. PubMed DOI

Boda K, Seidel T, Gasteiger J. Structure and reaction based evaluation of synthetic accessibility. J Comput Aided Mol Des. 2007;21(6):311–325. doi: 10.1007/s10822-006-9099-2. PubMed DOI

Gillet VJ, Myatt G, Zsoldos Z, Johnson AP. SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility. Perspect Drug Discov Des. 1995;3(1):34–50. doi: 10.1007/BF02174466. DOI

Selzer P, Roth H-J, Ertl P, Schuffenhauer A. Complex molecules: do they add value? Curr Opin Chem Biol. 2005;9(3):310–316. doi: 10.1016/j.cbpa.2005.04.001. PubMed DOI

Bertz SH. The first general index of molecular complexity. J Am Chem Soc. 1981;103(12):3599–3601. doi: 10.1021/ja00402a071. DOI

Whitlock HW. On the structure of total synthesis of complex natural products. J Org Chem. 1998;63(22):7982–7989. doi: 10.1021/jo9814546. DOI

Barone R, Chanon M. A new and simple approach to chemical complexity. Application to the synthesis of natural products. J Chem Inf Comput Sci. 2001;41(2):269–272. doi: 10.1021/ci000145p. PubMed DOI

Allu TK, Oprea TI. Rapid evaluation of synthetic and molecular complexity for in silico chemistry. J Chem Inf Model. 2005;45(5):1237–1243. doi: 10.1021/ci0501387. PubMed DOI

Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. PubMed DOI

Podolyan Y, Walters MA, Karypis G. Assessing synthetic accessibility of chemical compounds using machine learning methods. J Chem Inf Model. 2010;50(6):979–991. doi: 10.1021/ci900301v. PubMed DOI

Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, et al. PubChem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–D1213. doi: 10.1093/nar/gkv951. PubMed DOI PMC

Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG. ZINC: a free tool to discover chemistry for biology. J Chem Inf Model. 2012;52(7):1757–1768. doi: 10.1021/ci3001277. PubMed DOI PMC

Hoksza D, Skoda P, Vorsilak M, Svozil D. Molpher: a software framework for systematic chemical space exploration. J Cheminform. 2014;6:13. doi: 10.1186/1758-2946-6-7. PubMed DOI PMC

RDKit: Open-source cheminformatics. http://www.rdkit.org

Bishop C. Pattern recognition and machine learning. Berlin: Springer; 2007.

Fukunishi Y, Kurosawa T, Mikami Y, Nakamura H. Prediction of synthetic accessibility based on commercially available compound databases. J Chem Inf Model. 2014;54(12):3259–3267. doi: 10.1021/ci500568d. PubMed DOI

Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40(Database issue):D109–D114. doi: 10.1093/nar/gkr988. PubMed DOI PMC

Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–D462. doi: 10.1093/nar/gkv1070. PubMed DOI PMC

Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random Forest: a Classification and Regression Tool for Compound Classification and QSAR Modeling. J Chem Inf Comput Sci. 2003;43(6):1947–1958. doi: 10.1021/ci034160g. PubMed DOI

Palmer DS, O’Boyle NM, Glen RC, Mitchell JBO. Random Forest Models To Predict Aqueous Solubility. J Chem Inf Model. 2007;47(1):150–158. doi: 10.1021/ci060164k. PubMed DOI

Bruce CL, Melville JL, Pickett SD, Hirst JD. Contemporary QSAR Classifiers Compared. J Chem Inf Model. 2007;47(1):219–227. doi: 10.1021/ci600332j. PubMed DOI

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Profiling and analysis of chemical compounds using pointwise mutual information

. 2021 Jan 10 ; 13 (1) : 3. [epub] 20210110

SYBA: Bayesian estimation of synthetic accessibility of organic compounds

. 2020 May 20 ; 12 (1) : 35. [epub] 20200520

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...