SYBA: Bayesian estimation of synthetic accessibility of organic compounds

. 2020 May 20 ; 12 (1) : 35. [epub] 20200520

Status PubMed-not-MEDLINE Jazyk angličtina Země Anglie, Velká Británie Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid33431015

Grantová podpora
RVO 68378050-KAV-NPUI Ministerstvo Školství, Mládeže a Tělovýchovy
RVO 68378050-KAV-NPUI Ministerstvo Školství, Mládeže a Tělovýchovy
RVO 68378050-KAV-NPUI Ministerstvo Školství, Mládeže a Tělovýchovy
LM2018130 Ministerstvo Školství, Mládeže a Tělovýchovy
LM2018130 Ministerstvo Školství, Mládeže a Tělovýchovy
LM2018130 Ministerstvo Školství, Mládeže a Tělovýchovy
20/2015 Ministerstvo Školství, Mládeže a Tělovýchovy
CZ.02.1.01/0.0/0.0/16_019/0000785 Operational Programme Research, Development and Education

Odkazy

PubMed 33431015
PubMed Central PMC7238540
DOI 10.1186/s13321-020-00439-2
PII: 10.1186/s13321-020-00439-2
Knihovny.cz E-zdroje

SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to - 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.

Zobrazit více v PubMed

Bohacek RS, McMartin C, Guida WC. The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev. 1996;16(1):3–50. PubMed

Polishchuk PG, Madzhidov TI, Varnek A. Estimation of the size of drug-like chemical space based on GDB-17 data. J Comput Aided Mol Des. 2013;27(8):675–679. PubMed

Ertl P. Cheminformatics analysis of organic substituents: identification of the most common substituents, calculation of substituent properties, and automatic identification of drug-like bioisosteric groups. J Chem Inf Comput Sci. 2003;43(2):374–380. PubMed

Reymond JL, van Deursen R, Blum LC, Ruddigkeit L. Chemical space as a source for new drugs. Medchemcomm. 2010;1(1):30–38.

Llanos EJ, Leal W, Luu DH, Jost J, Stadler PF, Restrepo G. Exploration of the chemical space and its three historical regimes. Proc Natl Acad Sci U S A. 2019;116(26):12660–12665. PubMed PMC

Karlov DS, Sosnin S, Tetko IV, Fedorov MV. Chemical space exploration guided by deep neural networks. Rsc Advances. 2019;9(9):5151–5157. PubMed PMC

Gromski PS, Henson AB, Granda JM, Cronin L. How to explore chemical space using algorithms and automation. Nat Rev Chem. 2019;3(2):119–128.

Walters WP. Virtual chemical libraries. J Med Chem. 2019;62(3):1116–1124. PubMed

Franzini RM, Neri D, Scheuermann J. DNA-encoded chemical libraries: advancing beyond conventional small-molecule libraries. Acc Chem Res. 2014;47(4):1247–1255. PubMed

Lopez-Vallejo F, Caulfield T, Martinez-Mayorga K, Giulianotti MA, Nefzi A, Houghten RA, Medina-Franco JL. Integrating virtual screening and combinatorial chemistry for accelerated drug discovery. Comb Chem High Throughput Screen. 2011;14(6):475–487. PubMed

Hoffmann T, Gastreich M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov Today. 2019;24(5):1148–1156. PubMed

van Hilten N, Chevillard F, Kolb P. Virtual compound libraries in computer-assisted drug discovery. J Chem Inf Model. 2019;59(2):644–651. PubMed

Schneider G, Fechner U. Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov. 2005;4(8):649–663. PubMed

Loving K, Alberts I, Sherman W. Computational approaches for fragment-based and de novo design. Curr Top Med Chem. 2010;10(1):14–32. PubMed

Medina-Franco JL, Martinez-Mayorga K, Meurice N. Balancing novelty with confined chemical space in modern drug discovery. Expert Opin Drug Discov. 2014;9(2):151–165. PubMed

Schneider P, Schneider G. De Novo design at the edge of Chaos. J Med Chem. 2016;59(9):4077–4086. PubMed

Kutchukian PS, Shakhnovich EI. De novo design: balancing novelty and confined chemical space. Expert Opin Drug Discov. 2010;5(8):789–812. PubMed

Hartenfeller M, Schneider G. De novo drug design. Methods Mol Biol. 2011;672:299–323. PubMed

Hartenfeller M, Proschak E, Schuller A, Schneider G. Concept of combinatorial de novo design of drug-like molecules by particle swarm optimization. Chem Biol Drug Des. 2008;72(1):16–26. PubMed

Vinkers HM, de Jonge MR, Daeyaert FF, Heeres J, Koymans LM, van Lenthe JH, Lewi PJ, Timmerman H, Van Aken K, Janssen PA. SYNOPSIS: SYNthesize and OPtimize System in Silico. J Med Chem. 2003;46(13):2765–2773. PubMed

Hartenfeller M, Zettl H, Walter M, Rupp M, Reisen F, Proschak E, Weggen S, Stark H, Schneider G. DOGS: reaction-driven de novo design of bioactive compounds. PLoS Comput Biol. 2012;8(2):e1002380. PubMed PMC

Schneider G, Lee ML, Stahl M, Schneider P. De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks. J Comput Aided Mol Des. 2000;14(5):487–494. PubMed

Fechner U, Schneider G. Flux (1): a virtual synthesis scheme for fragment-based de novo design. J Chem Inf Model. 2006;46(2):699–707. PubMed

Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018;23(6):1241–1250. PubMed

Hessler G, Baringhaus KH. Artificial intelligence in drug design. Molecules. 2018;23(10):2520. doi: 10.3390/molecules23102520. PubMed DOI PMC

Xu Y, Lin K, Wang S, Wang L, Cai C, Song C, Lai L, Pei J. Deep learning for molecular generation. Future Med Chem. 2019;11(6):567–597. PubMed

Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de novo drug design. Sci Adv. 2018;4(7):7885. PubMed PMC

Olivecrona M, Blaschke T, Engkvist O, Chen H. Molecular de-novo design through deep reinforcement learning. J Cheminform. 2017;9(1):48. PubMed PMC

Blaschke T, Olivecrona M, Engkvist O, Bajorath J, Chen H. Application of generative autoencoder in de novo molecular design. Mol Inform. 2018;37(1–2):1700123. PubMed PMC

Segler MHS, Kogej T, Tyrchan C, Waller MP. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci. 2018;4(1):120–131. PubMed PMC

Gupta A, Müller AT, Huisman BJ, Fuchs JA, Schneider P, Schneider G. Generative recurrent networks for de novo drug design. Mol Inform. 2018;37(1–2):1700111. PubMed PMC

Merk D, Friedrich L, Grisoni F, Schneider G. De novo design of bioactive small molecules by artificial intelligence. Mol Inform. 2018;37(1–2):1700153. PubMed PMC

Mendez-Lucio O, Medina-Franco JL. The many roles of molecular complexity in drug discovery. Drug Discov Today. 2017;22(1):120–126. PubMed

Bertz SH. The first general index of molecular complexity. J Am Chem Soc. 1981;103(12):3599–3601.

Whitlock HW. On the structure of total synthesis of complex natural products. J Organic Chem. 1998;63(22):7982–7989.

Barone R, Chanon M. A new and simple approach to chemical complexity application to the synthesis of natural products. J Chem Inf Comp Sci. 2001;41(2):269–272. PubMed

Allu TK, Oprea TI. Rapid evaluation of synthetic and molecular complexity for in silico chemistry. J Chem Inf Model. 2005;45(5):1237–1243. PubMed

Selzer P, Roth HJ, Ertl P, Schuffenhauer A. Complex molecules: do they add value? Curr Opin Chem Biol. 2005;9(3):310–316. PubMed

Sheridan RP, Zorn N, Sherer EC, Campeau LC, Chang CZ, Cumming J, Maddess ML, Nantermet PG, Sinz CJ, O’Shea PD. Modeling a crowdsourced definition of molecular complexity. J Chem Inf Model. 2014;54(6):1604–1616. PubMed

Gillet VJ, Myatt G, Zsoldos Z, Johnson AP. SPROUT, HIPPO and CAESA: tools for de novo structure generation and estimation of synthetic accessibility. Perspect Drug Discov Des. 1995;3:34–50.

Huang Q, Li L-L, Yang S-Y. RASA: a rapid retrosynthesis-based scoring method for the assessment of synthetic accessibility of drug-like molecules. J Chem Inf Model. 2011;51(10):2768–2777. PubMed

Li J, Eastgate MD. Current complexity: a tool for assessing the complexity of organic molecules. Org Biomol Chem. 2015;13(26):7164–7176. PubMed

Coley CW, Rogers L, Green WH, Jensen KF. SCScore: synthetic complexity learned from a reaction corpus. J Chem Inf Model. 2018;58(2):252–261. PubMed

Reaxys. https://www.reaxys.com. Accessed 24 January 2020

Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminf. 2009;1:1–11. PubMed PMC

Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–754. PubMed

Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 2019;47(D1):D1102–D1109. PubMed PMC

Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguiz RM, Huang XP, Norval S, Sassano MF, Shin AI, Webster LA, et al. Automated design of ligands to polypharmacological profiles. Nature. 2012;492(7428):215–220. PubMed PMC

Yang X, Zhang J, Yoshizoe K, Terayama K, Tsuda K. ChemTS: an efficient python library for de novo molecular generation. Sci Technol Adv Mater. 2017;18(1):972–976. PubMed PMC

Chevillard F, Kolb P. SCUBIDOO: a large yet screenable and easily searchable database of computationally created chemical compounds optimized toward high likelihood of synthetic tractability. J Chem Inf Model. 2015;55(9):1824–1835. PubMed

Clark AM, Dole K, Coulon-Spektor A, McNutt A, Grass G, Freundlich JS, Reynolds RC, Ekins S. Open source Bayesian Models. 1. Application to ADME/Tox and drug discovery datasets. J Chem Inf Model. 2015;55(6):1231–1245. PubMed PMC

Xia X, Maliski EG, Gallant P, Rogers D. Classification of kinase inhibitors using a Bayesian model. J Med Chem. 2004;47(18):4463–4470. PubMed

Bender A. Bayesian methods in virtual screening and chemical biology. Methods Mol Biol. 2011;672:175–196. PubMed

Vogt M, Bajorath J. Introduction of an information-theoretic method to predict recovery rates of active compounds for Bayesian in silico screening: theory and screening trials. J Chem Inf Model. 2007;47(2):337–341. PubMed

Koutsoukas A, Lowe R, Kalantarmotamedi Y, Mussa HY, Klaffke W, Mitchell JB, Glen RC, Bender A. In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naive Bayes and Parzen-Rosenblatt window. J Chem Inf Model. 2013;53(8):1957–1966. PubMed

Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG. ZINC: a free tool to discover chemistry for biology. J Chem Inf Model. 2012;52(7):1757–1768. PubMed PMC

Sterling T, Irwin JJ. ZINC 15—ligand discovery for everyone. J Chem Inf Model. 2015;55(11):2324–2337. PubMed PMC

Voršilák M, Svozil D. Nonpher: computational method for design of hard-to-synthesize structures. J Cheminf. 2017;9(1):1–20. PubMed PMC

Hoksza D, Skoda P, Vorsilak M, Svozil D. Molpher: a software framework for systematic chemical space exploration. J Cheminf. 2014;6:1–13. PubMed PMC

RDKit: open-source cheminformatics. http://www.rdkit.org. Accessed 24 January 2020

Ruddigkeit L, van Deursen R, Blum LC, Reymond J-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model. 2012;52(11):2864–2875. PubMed

Boda K, Seidel T, Gasteiger J. Structure and reaction based evaluation of synthetic accessibility. J Comput-Aided Mol Des. 2007;21(6):311–325. PubMed

Fukunishi Y, Kurosawa T, Mikami Y, Nakamura H. Prediction of synthetic accessibility based on commercially available compound databases. J Chem Inf Model. 2014;54(12):3259–3267. PubMed

Sheridan RP. Using random forest to model the domain applicability of another random forest model. J Chem Inf Model. 2013;53(11):2837–2850. PubMed

Kensert A, Alvarsson J, Norinder U, Spjuth O. Evaluating parameters for ligand-based modeling with random forest on sparse data sets. J Cheminform. 2018;10(1):49. PubMed PMC

Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947–1958. PubMed

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine Learning in Python. J Mach Learn Res. 2011;12:2825–2830.

SCScore GitHub. https://github.com/connorcoley/scscore. Accessed 24 January 2020

Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32–35. PubMed

Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biom J. 2005;47(4):458–472. PubMed

Looney SW. A statistical technique for comparing the accuracies of several classifiers. Pattern Recogn Lett. 1988;8(1):5–9.

Westfall PH, Troendle JF, Pennello G. Multiple McNemar tests. Biometrics. 2010;66(4):1185–1191. PubMed PMC

Benjamini Y, Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57(1):289–300.

Riniker S, Landrum GA. Similarity maps—a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform. 2013;5(1):43. PubMed PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Profiling and analysis of chemical compounds using pointwise mutual information

. 2021 Jan 10 ; 13 (1) : 3. [epub] 20210110

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...