Nonpher: computational method for design of hard-to-synthesize structures
Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články
PubMed
29086122
PubMed Central
PMC5359269
DOI
10.1186/s13321-017-0206-2
PII: 10.1186/s13321-017-0206-2
Knihovny.cz E-zdroje
- Klíčová slova
- Molecular complexity, Molecular morphing, Synthetic feasibility,
- Publikační typ
- časopisecké články MeSH
In cheminformatics, machine learning methods are typically used to classify chemical compounds into distinctive classes such as active/nonactive or toxic/nontoxic. To train a classifier, a training data set must consist of examples from both positive and negative classes. While a biological activity or toxicity can be experimentally measured, another important molecular property, a synthetic feasibility, is a more abstract feature that can't be easily assessed. In the present paper, we introduce Nonpher, a computational method for the construction of a hard-to-synthesize virtual library. Nonpher is based on a molecular morphing algorithm in which new structures are iteratively generated by simple structural changes, such as the addition or removal of an atom or a bond. In Nonpher, molecular morphing was optimized so that it yields structures not overly complex, but just right hard-to-synthesize. Nonpher results were compared with SAscore and dense region (DR), other two methods for the generation of hard-to-synthesize compounds. Random forest classifier trained on Nonpher data achieves better results than models obtained using SAscore and DR data.
Zobrazit více v PubMed
Klebe G. Virtual ligand screening: strategies, perspectives and limitations. Drug Discov Today. 2006;11(13–14):580–594. doi: 10.1016/j.drudis.2006.05.012. PubMed DOI PMC
Shoichet BK. Virtual screening of chemical libraries. Nature. 2004;432(7019):862–865. doi: 10.1038/nature03197. PubMed DOI PMC
Hartenfeller M, Schneider G. De Novo Drug Design. In: Bajorath J, editor. Chemoinformatics and computational chemical biology. Totowa: Humana Press; 2011. pp. 299–323.
Bonnet P. Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. Eur J Med Chem. 2012;54:679–689. doi: 10.1016/j.ejmech.2012.06.024. PubMed DOI
Lajiness MS, Maggiora GM, Shanmugasundaram V. Assessment of the consistency of medicinal chemists in reviewing sets of compounds. J Med Chem. 2004;47(20):4891–4896. doi: 10.1021/jm049740z. PubMed DOI
Takaoka Y, Endo Y, Yamanobe S, Kakinuma H, Okubo T, Shimazaki Y, Ota T, Sumiya S, Yoshikawa K. Development of a method for evaluating drug-likeness and ease of synthesis using a data set in which compounds are assigned scores based on chemists’ intuition. J Chem Inf Comput Sci. 2003;43(4):1269–1275. doi: 10.1021/ci034043l. PubMed DOI
Kutchukian PS, Vasilyeva NY, Xu J, Lindvall MK, Dillon MP, Glick M, Coley JD, Brooijmans N. Inside the mind of a medicinal chemist: the role of human bias in compound prioritization during drug discovery. PLoS ONE. 2012;7(11):e48476. doi: 10.1371/journal.pone.0048476. PubMed DOI PMC
Baber JC, Feher M. Predicting synthetic accessibility: application in drug discovery and development. Mini Rev Med Chem. 2004;4(6):681–692. doi: 10.2174/1389557043403765. PubMed DOI
Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminform. 2009;1:11. doi: 10.1186/1758-2946-1-8. PubMed DOI PMC
Ihlenfeldt W-D, Gasteiger J. Computer-assisted planning of organic syntheses: the second generation of programs. Angew Chem Int Ed Engl. 1996;34(23–24):2613–2633. doi: 10.1002/anie.199526131. DOI
Huang Q, Li L-L, Yang S-Y. RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules. J Chem Inf Model. 2011;51(10):2768–2777. doi: 10.1021/ci100216g. PubMed DOI
Boda K, Seidel T, Gasteiger J. Structure and reaction based evaluation of synthetic accessibility. J Comput Aided Mol Des. 2007;21(6):311–325. doi: 10.1007/s10822-006-9099-2. PubMed DOI
Gillet VJ, Myatt G, Zsoldos Z, Johnson AP. SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility. Perspect Drug Discov Des. 1995;3(1):34–50. doi: 10.1007/BF02174466. DOI
Selzer P, Roth H-J, Ertl P, Schuffenhauer A. Complex molecules: do they add value? Curr Opin Chem Biol. 2005;9(3):310–316. doi: 10.1016/j.cbpa.2005.04.001. PubMed DOI
Bertz SH. The first general index of molecular complexity. J Am Chem Soc. 1981;103(12):3599–3601. doi: 10.1021/ja00402a071. DOI
Whitlock HW. On the structure of total synthesis of complex natural products. J Org Chem. 1998;63(22):7982–7989. doi: 10.1021/jo9814546. DOI
Barone R, Chanon M. A new and simple approach to chemical complexity. Application to the synthesis of natural products. J Chem Inf Comput Sci. 2001;41(2):269–272. doi: 10.1021/ci000145p. PubMed DOI
Allu TK, Oprea TI. Rapid evaluation of synthetic and molecular complexity for in silico chemistry. J Chem Inf Model. 2005;45(5):1237–1243. doi: 10.1021/ci0501387. PubMed DOI
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. doi: 10.1021/ci100050t. PubMed DOI
Podolyan Y, Walters MA, Karypis G. Assessing synthetic accessibility of chemical compounds using machine learning methods. J Chem Inf Model. 2010;50(6):979–991. doi: 10.1021/ci900301v. PubMed DOI
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, et al. PubChem substance and compound databases. Nucleic Acids Res. 2016;44(D1):D1202–D1213. doi: 10.1093/nar/gkv951. PubMed DOI PMC
Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG. ZINC: a free tool to discover chemistry for biology. J Chem Inf Model. 2012;52(7):1757–1768. doi: 10.1021/ci3001277. PubMed DOI PMC
Hoksza D, Skoda P, Vorsilak M, Svozil D. Molpher: a software framework for systematic chemical space exploration. J Cheminform. 2014;6:13. doi: 10.1186/1758-2946-6-7. PubMed DOI PMC
RDKit: Open-source cheminformatics. http://www.rdkit.org
Bishop C. Pattern recognition and machine learning. Berlin: Springer; 2007.
Fukunishi Y, Kurosawa T, Mikami Y, Nakamura H. Prediction of synthetic accessibility based on commercially available compound databases. J Chem Inf Model. 2014;54(12):3259–3267. doi: 10.1021/ci500568d. PubMed DOI
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012;40(Database issue):D109–D114. doi: 10.1093/nar/gkr988. PubMed DOI PMC
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–D462. doi: 10.1093/nar/gkv1070. PubMed DOI PMC
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random Forest: a Classification and Regression Tool for Compound Classification and QSAR Modeling. J Chem Inf Comput Sci. 2003;43(6):1947–1958. doi: 10.1021/ci034160g. PubMed DOI
Palmer DS, O’Boyle NM, Glen RC, Mitchell JBO. Random Forest Models To Predict Aqueous Solubility. J Chem Inf Model. 2007;47(1):150–158. doi: 10.1021/ci060164k. PubMed DOI
Bruce CL, Melville JL, Pickett SD, Hirst JD. Contemporary QSAR Classifiers Compared. J Chem Inf Model. 2007;47(1):219–227. doi: 10.1021/ci600332j. PubMed DOI
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830.
Profiling and analysis of chemical compounds using pointwise mutual information
SYBA: Bayesian estimation of synthetic accessibility of organic compounds