SYBA: Bayesian estimation of synthetic accessibility of organic compounds
Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
RVO 68378050-KAV-NPUI
Ministerstvo Školství, Mládeže a Tělovýchovy
RVO 68378050-KAV-NPUI
Ministerstvo Školství, Mládeže a Tělovýchovy
RVO 68378050-KAV-NPUI
Ministerstvo Školství, Mládeže a Tělovýchovy
LM2018130
Ministerstvo Školství, Mládeže a Tělovýchovy
LM2018130
Ministerstvo Školství, Mládeže a Tělovýchovy
LM2018130
Ministerstvo Školství, Mládeže a Tělovýchovy
20/2015
Ministerstvo Školství, Mládeže a Tělovýchovy
CZ.02.1.01/0.0/0.0/16_019/0000785
Operational Programme Research, Development and Education
PubMed
33431015
PubMed Central
PMC7238540
DOI
10.1186/s13321-020-00439-2
PII: 10.1186/s13321-020-00439-2
Knihovny.cz E-zdroje
- Klíčová slova
- Bayesian analysis, Bernoulli naïve Bayes, Synthetic accessibility,
- Publikační typ
- časopisecké články MeSH
SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to - 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.
Zobrazit více v PubMed
Bohacek RS, McMartin C, Guida WC. The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev. 1996;16(1):3–50. PubMed
Polishchuk PG, Madzhidov TI, Varnek A. Estimation of the size of drug-like chemical space based on GDB-17 data. J Comput Aided Mol Des. 2013;27(8):675–679. PubMed
Ertl P. Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups. J Chem Inf Comput Sci. 2003;43(2):374–380. PubMed
Reymond JL, van Deursen R, Blum LC, Ruddigkeit L. Chemical space as a source for new drugs. Medchemcomm. 2010;1(1):30–38.
Llanos EJ, Leal W, Luu DH, Jost J, Stadler PF, Restrepo G. Exploration of the chemical space and its three historical regimes. Proc Natl Acad Sci U S A. 2019;116(26):12660–12665. PubMed PMC
Karlov DS, Sosnin S, Tetko IV, Fedorov MV. Chemical space exploration guided by deep neural networks. Rsc Advances. 2019;9(9):5151–5157. PubMed PMC
Gromski PS, Henson AB, Granda JM, Cronin L. How to explore chemical space using algorithms and automation. Nat Rev Chem. 2019;3(2):119–128.
Walters WP. Virtual chemical libraries. J Med Chem. 2019;62(3):1116–1124. PubMed
Franzini RM, Neri D, Scheuermann J. DNA-encoded chemical libraries: advancing beyond conventional small-molecule libraries. Acc Chem Res. 2014;47(4):1247–1255. PubMed
Lopez-Vallejo F, Caulfield T, Martinez-Mayorga K, Giulianotti MA, Nefzi A, Houghten RA, Medina-Franco JL. Integrating virtual screening and combinatorial chemistry for accelerated drug discovery. Comb Chem High Throughput Screen. 2011;14(6):475–487. PubMed
Hoffmann T, Gastreich M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov Today. 2019;24(5):1148–1156. PubMed
van Hilten N, Chevillard F, Kolb P. Virtual compound libraries in computer-assisted drug discovery. J Chem Inf Model. 2019;59(2):644–651. PubMed
Schneider G, Fechner U. Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov. 2005;4(8):649–663. PubMed
Loving K, Alberts I, Sherman W. Computational approaches for fragment-based and de novo design. Curr Top Med Chem. 2010;10(1):14–32. PubMed
Medina-Franco JL, Martinez-Mayorga K, Meurice N. Balancing novelty with confined chemical space in modern drug discovery. Expert Opin Drug Discov. 2014;9(2):151–165. PubMed
Schneider P, Schneider G. De Novo design at the edge of Chaos. J Med Chem. 2016;59(9):4077–4086. PubMed
Kutchukian PS, Shakhnovich EI. De novo design: balancing novelty and confined chemical space. Expert Opin Drug Discov. 2010;5(8):789–812. PubMed
Hartenfeller M, Schneider G. De novo drug design. Methods Mol Biol. 2011;672:299–323. PubMed
Hartenfeller M, Proschak E, Schuller A, Schneider G. Concept of combinatorial de novo design of drug-like molecules by particle swarm optimization. Chem Biol Drug Des. 2008;72(1):16–26. PubMed
Vinkers HM, de Jonge MR, Daeyaert FF, Heeres J, Koymans LM, van Lenthe JH, Lewi PJ, Timmerman H, Van Aken K, Janssen PA. SYNOPSIS: SYNthesize and OPtimize System in Silico. J Med Chem. 2003;46(13):2765–2773. PubMed
Hartenfeller M, Zettl H, Walter M, Rupp M, Reisen F, Proschak E, Weggen S, Stark H, Schneider G. DOGS: reaction-driven de novo design of bioactive compounds. PLoS Comput Biol. 2012;8(2):e1002380. PubMed PMC
Schneider G, Lee ML, Stahl M, Schneider P. De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks. J Comput Aided Mol Des. 2000;14(5):487–494. PubMed
Fechner U, Schneider G. Flux (1): a virtual synthesis scheme for fragment-based de novo design. J Chem Inf Model. 2006;46(2):699–707. PubMed
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23(6):1241–1250. PubMed
Hessler G, Baringhaus KH. Artificial intelligence in drug design. Molecules. 2018;23(10):2520. doi: 10.3390/molecules23102520. PubMed DOI PMC
Xu Y, Lin K, Wang S, Wang L, Cai C, Song C, Lai L, Pei J. Deep learning for molecular generation. Future Med Chem. 2019;11(6):567–597. PubMed
Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de novo drug design. Sci Adv. 2018;4(7):7885. PubMed PMC
Olivecrona M, Blaschke T, Engkvist O, Chen H. Molecular de-novo design through deep reinforcement learning. J Cheminform. 2017;9(1):48. PubMed PMC
Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H. Application of generative autoencoder in de novo molecular design. Mol Inform. 2018;37(1–2):1700123. PubMed PMC
Segler MHS, Kogej T, Tyrchan C, Waller MP. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci. 2018;4(1):120–131. PubMed PMC
Gupta A, Müller AT, Huisman BJ, Fuchs JA, Schneider P, Schneider G. Generative recurrent networks for de novo drug design. Mol Inform. 2018;37(1–2):1700111. PubMed PMC
Merk D, Friedrich L, Grisoni F, Schneider G. De novo design of bioactive small molecules by artificial intelligence. Mol Inform. 2018;37(1–2):1700153. PubMed PMC
Mendez-Lucio O, Medina-Franco JL. The many roles of molecular complexity in drug discovery. Drug Discov Today. 2017;22(1):120–126. PubMed
Bertz SH. The first general index of molecular complexity. J Am Chem Soc. 1981;103(12):3599–3601.
Whitlock HW. On the structure of total synthesis of complex natural products. J Organic Chem. 1998;63(22):7982–7989.
Barone R, Chanon M. A new and simple approach to chemical complexity application to the synthesis of natural products. J Chem Inf Comp Sci. 2001;41(2):269–272. PubMed
Allu TK, Oprea TI. Rapid evaluation of synthetic and molecular complexity for in silico chemistry. J Chem Inf Model. 2005;45(5):1237–1243. PubMed
Selzer P, Roth HJ, Ertl P, Schuffenhauer A. Complex molecules: do they add value? Curr Opin Chem Biol. 2005;9(3):310–316. PubMed
Sheridan RP, Zorn N, Sherer EC, Campeau LC, Chang CZ, Cumming J, Maddess ML, Nantermet PG, Sinz CJ, O’Shea PD. Modeling a crowdsourced definition of molecular complexity. J Chem Inf Model. 2014;54(6):1604–1616. PubMed
Gillet VJ, Myatt G, Zsoldos Z, Johnson AP. SPROUT, HIPPO and CAESA: tools for de novo structure generation and estimation of synthetic accessibility. Perspect Drug Discov Des. 1995;3:34–50.
Huang Q, Li L-L, Yang S-Y. RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules. J Chem Inf Model. 2011;51(10):2768–2777. PubMed
Li J, Eastgate MD. Current complexity: a tool for assessing the complexity of organic molecules. Org Biomol Chem. 2015;13(26):7164–7176. PubMed
Coley CW, Rogers L, Green WH, Jensen KF. SCScore: synthetic complexity learned from a reaction corpus. J Chem Inf Model. 2018;58(2):252–261. PubMed
Reaxys. https://www.reaxys.com. Accessed 24 January 2020
Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminf. 2009;1:1–11. PubMed PMC
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. PubMed
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019;47(D1):D1102–D1109. PubMed PMC
Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguiz RM, Huang XP, Norval S, Sassano MF, Shin AI, Webster LA, et al. Automated design of ligands to polypharmacological profiles. Nature. 2012;492(7428):215–220. PubMed PMC
Yang X, Zhang J, Yoshizoe K, Terayama K, Tsuda K. ChemTS: an efficient python library for de novo molecular generation. Sci Technol Adv Mater. 2017;18(1):972–976. PubMed PMC
Chevillard F, Kolb P. SCUBIDOO: a large yet screenable and easily searchable database of computationally created chemical compounds optimized toward high likelihood of synthetic tractability. J Chem Inf Model. 2015;55(9):1824–1835. PubMed
Clark AM, Dole K, Coulon-Spektor A, McNutt A, Grass G, Freundlich JS, Reynolds RC, Ekins S. Open source Bayesian Models. 1. Application to ADME/Tox and drug discovery datasets. J Chem Inf Model. 2015;55(6):1231–1245. PubMed PMC
Xia X, Maliski EG, Gallant P, Rogers D. Classification of kinase inhibitors using a Bayesian model. J Med Chem. 2004;47(18):4463–4470. PubMed
Bender A. Bayesian methods in virtual screening and chemical biology. Methods Mol Biol. 2011;672:175–196. PubMed
Vogt M, Bajorath J. Introduction of an information-theoretic method to predict recovery rates of active compounds for Bayesian in silico screening: theory and screening trials. J Chem Inf Model. 2007;47(2):337–341. PubMed
Koutsoukas A, Lowe R, Kalantarmotamedi Y, Mussa HY, Klaffke W, Mitchell JB, Glen RC, Bender A. In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naive Bayes and Parzen-Rosenblatt window. J Chem Inf Model. 2013;53(8):1957–1966. PubMed
Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG. ZINC: a free tool to discover chemistry for biology. J Chem Inf Model. 2012;52(7):1757–1768. PubMed PMC
Sterling T, Irwin JJ. ZINC 15—ligand discovery for everyone. J Chem Inf Model. 2015;55(11):2324–2337. PubMed PMC
Voršilák M, Svozil D. Nonpher: computational method for design of hard-to-synthesize structures. J Cheminf. 2017;9(1):1–20. PubMed PMC
Hoksza D, Skoda P, Vorsilak M, Svozil D. Molpher: a software framework for systematic chemical space exploration. J Cheminf. 2014;6:1–13. PubMed PMC
RDKit: open-source cheminformatics. http://www.rdkit.org. Accessed 24 January 2020
Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model. 2012;52(11):2864–2875. PubMed
Boda K, Seidel T, Gasteiger J. Structure and reaction based evaluation of synthetic accessibility. J Comput-Aided Mol Des. 2007;21(6):311–325. PubMed
Fukunishi Y, Kurosawa T, Mikami Y, Nakamura H. Prediction of synthetic accessibility based on commercially available compound databases. J Chem Inf Model. 2014;54(12):3259–3267. PubMed
Sheridan RP. Using random forest to model the domain applicability of another random forest model. J Chem Inf Model. 2013;53(11):2837–2850. PubMed
Kensert A, Alvarsson J, Norinder U, Spjuth O. Evaluating parameters for ligand-based modeling with random forest on sparse data sets. J Cheminform. 2018;10(1):49. PubMed PMC
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947–1958. PubMed
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine Learning in Python. J Mach Learn Res. 2011;12:2825–2830.
SCScore GitHub. https://github.com/connorcoley/scscore. Accessed 24 January 2020
Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–35. PubMed
Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biom J. 2005;47(4):458–472. PubMed
Looney SW. A statistical technique for comparing the accuracies of several classifiers. Pattern Recogn Lett. 1988;8(1):5–9.
Westfall PH, Troendle JF, Pennello G. Multiple McNemar tests. Biometrics. 2010;66(4):1185–1191. PubMed PMC
Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57(1):289–300.
Riniker S, Landrum GA. Similarity maps—a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform. 2013;5(1):43. PubMed PMC
Profiling and analysis of chemical compounds using pointwise mutual information