Computational exploration of chemical space is crucial in modern cheminformatics research for accelerating the discovery of new biologically active compounds. In this study, we present a detailed analysis of the chemical library of potential glucocorticoid receptor (GR) ligands generated by the molecular generator, Molpher. To generate the targeted GR library and construct the classification models, structures from the ChEMBL database as well as from the internal IMG library, which was experimentally screened for biological activity in the primary luciferase reporter cell assay, were utilized. The composition of the targeted GR ligand library was compared with a reference library that randomly samples chemical space. A random forest model was used to determine the biological activity of ligands, incorporating its applicability domain using conformal prediction. It was demonstrated that the GR library is significantly enriched with GR ligands compared to the random library. Furthermore, a prospective analysis demonstrated that Molpher successfully designed compounds, which were subsequently experimentally confirmed to be active on the GR. A collection of 34 potential new GR ligands was also identified. Moreover, an important contribution of this study is the establishment of a comprehensive workflow for evaluating computationally generated ligands, particularly those with potential activity against targets that are challenging to dock.
- Klíčová slova
- chemical space, de novo design, glucocorticoid receptor, molecular generation,
- MeSH
- knihovny malých molekul * farmakologie chemie MeSH
- lidé MeSH
- ligandy MeSH
- receptory glukokortikoidů * metabolismus chemie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- knihovny malých molekul * MeSH
- ligandy MeSH
- receptory glukokortikoidů * MeSH
The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.
Investigation of the influence of molecular structure of different organic compounds on acute toxicity towards Fathead minnow, Daphnia magna, and Tetrahymena pyriformis has been carried out using 2D simplex representation of molecular structure and two modelling methods: Random Forest (RF) and Gradient Boosting Machine (GBM). Suitable QSAR (Quantitative Structure - Activity Relationships) models were obtained. The study was focused on QSAR models interpretation. The aim of the study was to develop a set of structural fragments that simultaneously consistently increase toxicity toward Fathead minnow, Daphnia magna, Tetrahymena pyriformis. The interpretation allowed to gain more details about known toxicophores and to propose new fragments. The results obtained made it possible to rank the contributions of molecular fragments to various types of toxicity to aquatic organisms. This information can be used for molecular optimization of chemicals. According to the results of structural interpretation, the most significant common mechanisms of the toxic effect of organic compounds on Fathead minnow, Daphnia magna and Tetrahymena pyriformis are reactions of nucleophilic substitution and inhibition of oxidative phosphorylation in mitochondria. In addition acetylcholinesterase and voltage-gated ion channel of Fathead minnow and Daphnia magna are important targets for toxicants. The on-line version of the OCHEM expert system (https://ochem.eu) were used for a comparative QSAR investigation. The proposed QSAR models comply with the OECD principles and can be used to reliably predict acute toxicity of organic compounds towards Fathead minnow, Daphnia magna and Tetrahymena pyriformis with allowance for applicability domain estimation.
- Klíčová slova
- Ecotoxicity, Machine Learning, Molecular modelling, QSAR interpretation., Simplex Descriptors,
- MeSH
- acetylcholinesterasa toxicita MeSH
- Cyprinidae * MeSH
- Daphnia chemie MeSH
- organické látky toxicita MeSH
- Tetrahymena pyriformis * MeSH
- zvířata MeSH
- Check Tag
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- acetylcholinesterasa MeSH
- organické látky MeSH
Here, we report the data visualization, analysis and modeling for a large set of 4830 SN 2 reactions the rate constant of which (logk) was measured at different experimental conditions (solvent, temperature). The reactions were encoded by one single molecular graph - Condensed Graph of Reactions, which allowed us to use conventional chemoinformatics techniques developed for individual molecules. Thus, Matched Reaction Pairs approach was suggested and used for the analyses of substituents effects on the substrates and nucleophiles reactivity. The data were visualized with the help of the Generative Topographic Mapping approach. Consensus Support Vector Regression (SVR) model for the rate constant was prepared. Unbiased estimation of the model's performance was made in cross-validation on reactions measured on unique structural transformations. The model's performance in cross-validation (RMSE=0.61 logk units) and on the external test set (RMSE=0.80) is close to the noise in data. Performances of the local models obtained for selected subsets of reactions proceeding in particular solvents or with particular type of nucleophiles were similar to that of the model built on the entire set. Finally, four different definitions of model's applicability domains for reactions were examined.
- Klíčová slova
- Condensed Graph of Reaction, Generative Topographic Mapping, Matched Reaction Pairs, Support Vector Regression, bimolecular nucleophilic substitution reactions, models applicability domain,
- MeSH
- chemické modely * MeSH
- cyklické uhlovodíky chemie MeSH
- kinetika MeSH
- oxidace-redukce MeSH
- support vector machine * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- cyklické uhlovodíky MeSH
The study focused on QSAR model interpretation. The goal was to develop a workflow for the identification of molecular fragments in different contexts important for the property modelled. Using a previously established approach - Structural and physicochemical interpretation of QSAR models (SPCI) - fragment contributions were calculated and their relative influence on the compounds' properties characterised. Analysis of the distributions of these contributions using Gaussian mixture modelling was performed to identify groups of compounds (clusters) comprising the same fragment, where these fragments had substantially different contributions to the property studied. SMARTSminer was used to detect patterns discriminating groups of compounds from each other and visual inspection if the former did not help. The approach was applied to analyse the toxicity, in terms of 40 hour inhibition of growth, of 1984 compounds to Tetrahymena pyriformis. The results showed that the clustering technique correctly identified known toxicophoric patterns: it detected groups of compounds where fragments have specific molecular context making them contribute substantially more to toxicity. The results show the applicability of the interpretation of QSAR models to retrieve reasonable patterns, even from data sets consisting of compounds having different mechanisms of action, something which is difficult to achieve using conventional pattern/data mining approaches.
- Klíčová slova
- Gaussian Mixture Modeling, QSAR interpretation, pattern mining,
- MeSH
- antiprotozoální látky chemie toxicita MeSH
- data mining metody MeSH
- kvantitativní vztahy mezi strukturou a aktivitou * MeSH
- racionální návrh léčiv * MeSH
- simulace molekulového dockingu metody MeSH
- software MeSH
- Tetrahymena účinky léků MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- antiprotozoální látky MeSH