Small molecule databases Dotaz Zobrazit nápovědu
BACKGROUND: In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. RESULTS: We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. CONCLUSIONS: Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF.
- Klíčová slova
- Database of small molecules, Resource Description Framework, SPARQL query language,
- Publikační typ
- časopisecké články MeSH
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/ .
- Klíčová slova
- Resource Descriptor Framework, SPARQL, Small-molecule datasets,
- Publikační typ
- časopisecké články MeSH
Skin sensitization potential or potency is an important end point in the safety assessment of new chemicals and new chemical mixtures. Formerly, animal experiments such as the local lymph node assay (LLNA) were the main form of assessment. Today, however, the focus lies on the development of nonanimal testing approaches (i.e., in vitro and in chemico assays) and computational models. In this work, we investigate, based on publicly available LLNA data, the ability of aggregated, Mondrian conformal prediction classifiers to differentiate between non- sensitizing and sensitizing compounds as well as between two levels of skin sensitization potential (weak to moderate sensitizers, and strong to extreme sensitizers). The advantage of the conformal prediction framework over other modeling approaches is that it assigns compounds to activity classes only if a defined minimum level of confidence is reached for the individual predictions. This eliminates the need for applicability domain criteria that often are arbitrary in their nature and less flexible. Our new binary classifier, named Skin Doctor CP, differentiates nonsensitizers from sensitizers with a higher reliability-to-efficiency ratio than the corresponding nonconformal prediction workflow that we presented earlier. When tested on a set of 257 compounds at the significance levels of 0.10 and 0.30, the model reached an efficiency of 0.49 and 0.92, and an accuracy of 0.83 and 0.75, respectively. In addition, we developed a ternary classification workflow to differentiate nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers. Although this model achieved satisfactory overall performance (accuracies of 0.90 and 0.73, and efficiencies of 0.42 and 0.90, at significance levels 0.10 and 0.30, respectively), it did not obtain satisfying class-wise results (at a significance level of 0.30, the validities obtained for nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers were 0.70, 0.58, and 0.63, respectively). We argue that the model is, in consequence, unable to reliably identify strong to extreme sensitizers and suggest that other ternary models derived from the currently accessible LLNA data might suffer from the same problem. Skin Doctor CP is available via a public web service at https://nerdd.zbh.uni-hamburg.de/skinDoctorII/.
- MeSH
- databáze faktografické MeSH
- knihovny malých molekul chemie farmakologie MeSH
- kožní testy * MeSH
- kůže účinky léků MeSH
- molekulární struktura MeSH
- myši MeSH
- organické látky chemie farmakologie MeSH
- test regionální lymfatické uzliny MeSH
- zvířata MeSH
- Check Tag
- myši MeSH
- zvířata MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- knihovny malých molekul MeSH
- organické látky MeSH
BACKGROUND: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. RESULTS: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. CONCLUSIONS: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
- Klíčová slova
- Inverted indices, Molecule cartridges, Small molecule databases, Substructure search,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: Quercetin is a natural flavonoid, which widely exists in nature, such as tea, coffee, apples, and onions. Numerous studies have showed that quercetin has multiple biological activities such as anti-oxidation, anti-inflammatory, and anti-aging. Hence, quercetin has a significant therapeutic effect on cancers, obesity, diabetes, and other diseases. In the past decades, a large number of studies have shown that quercetin combined with other agents can significantly improve the overall therapeutic effect, compared to single use. PURPOSE: This work reviews the pharmacological activities of quercetin and its derivatives. In addition, this work also summarizes both in vivo and in vitro experimental evidence for the synergistic effect of quercetin against cancers and metabolic diseases. METHODS: An extensive systematic search for pharmacological activities and synergistic effect of quercetin was performed considering all the relevant literatures published until August 2021 through the databases including NCBI PubMed, Scopus, Web of Science, and Google Scholar. The relevant literatures were extracted from the databases with following keyword combinations: "pharmacological activities" OR "biological activities" OR "synergistic effect" OR "combined" OR "combination" AND "quercetin" as well as free-text words. RESULTS: Quercetin and its derivatives possess multiple pharmacological activities including anti-cancer, anti-oxidant, anti-inflammatory, anti-cardiovascular, anti-aging, and neuroprotective activities. In addition, the synergistic effect of quercetin with small molecule agents against cancers and metabolic diseases has also been confirmed. CONCLUSION: Quercetin cooperates with agents to improve the therapeutic effect by regulating signal molecules and blocking cell cycle. Synergistic therapy can reduce the dose of agents and avoid the possible toxic and side effects in the treatment process. Although quercetin treatment has some potential side effects, it is safe under the expected use conditions. Hence, quercetin has application value and potential strength as a clinical drug. Furthermore, quercetin, as the main effective therapeutic ingredient in traditional Chinese medicine, may effectively treat and prevent coronavirus disease 2019 (COVID-19).
- Klíčová slova
- Combination therapy, Pharmacological activities, Quercetin, Synergistic effect,
- MeSH
- antioxidancia farmakologie MeSH
- COVID-19 * MeSH
- lidé MeSH
- quercetin * farmakologie MeSH
- rostlinné extrakty MeSH
- SARS-CoV-2 MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
- Názvy látek
- antioxidancia MeSH
- quercetin * MeSH
- rostlinné extrakty MeSH
MOTIVATION: The existing connections between large databases of chemicals, proteins, metabolites and assays offer valuable resources for research in fields ranging from drug design to metabolomics. Transparent search across multiple databases provides a way to efficiently utilize these resources. To simplify such searches, many databases have adopted semantic technologies that allow interoperable querying of the datasets using SPARQL query language. However, the interoperable interfaces of the chemical databases still lack the functionality of structure-driven chemical search, which is a fundamental method of data discovery in the chemical search space. RESULTS: We present a SPARQL service that augments existing semantic services by making interoperable substructure and similarity searches in small-molecule databases possible. The service thus offers new possibilities for querying interoperable databases, and simplifies writing of heterogeneous queries that include chemical-structure search terms. AVAILABILITY: The service is freely available and accessible using a standard SPARQL endpoint interface. The service documentation and user-oriented demonstration interfaces that allow quick explorative querying of datasets are available at https://idsm.elixir-czech.cz .
- Klíčová slova
- Interoperability, Linked data, Small molecule databases, Substructure search,
- Publikační typ
- časopisecké články MeSH
BACKGROUND: MicroRNAs are small non-coding one-stranded RNA molecules that play an important role in the post-transcriptional regulation of genes. Bioinformatic predictions indicate that each miRNA can regulate hundreds of target genes. MicroRNA expression can be associated with various cellular processes leading to the metastasis of malignant tumours including non-small cell lung carcinoma. This review summarizes current knowledge on the role of microRNAs in NSCLC metastasis to the brain and lymph nodes. METHODS: A search of the NCBI/PubMed database for publications on expression levels and the mechanisms of microRNA action in NSCLC metastasis. RESULTS AND CONCLUSION: Dysregulation of microRNAs in NSCLC can be associated with brain and lymph node metastasis. There are differences in microRNA expression profiling between NSCLC with and without metastases but it is currently not possible to reliably predict the site of metastasis in NSCLC. Based on data from RNAmicroarrays, bioinformatics analysis is able to predict the target genes of highlighted microRNAs, providing us with complex information about cancer cell features such as enhanced proliferation, migration and invasion. Such microRNAs may then be knocked-down using siRNAs or substituted with miRNA mimics. RNA microarray profiling may thus be a useful tool to select up- or down-regulated microRNAs. A number of authors suggest that microRNAs could serve as biomarkers and therapeutic targets in the treatment of NSCLC metastasis.
- Klíčová slova
- brain metastasis, lymph node metastasis, microRNA, non-small cell lung carcinoma,
- MeSH
- down regulace MeSH
- lidé MeSH
- lymfatické metastázy MeSH
- metastázy nádorů MeSH
- mikro RNA fyziologie MeSH
- nádorové buněčné linie MeSH
- nádory kostí sekundární MeSH
- nádory mozku sekundární MeSH
- nádory plic etiologie MeSH
- nemalobuněčný karcinom plic etiologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
- Názvy látek
- mikro RNA MeSH
Several therapeutic monoclonal antibodies approved by the FDA are available against the PD-1/PD-L1 (programmed death 1/programmed death ligand 1) immune checkpoint axis, which has been an unprecedented success in cancer treatment. However, existing therapeutics against PD-L1, including small molecule inhibitors, have certain drawbacks such as high cost and drug resistance that challenge the currently available anti-PD-L1 therapy. Therefore, this study presents the screening of 32,552 compounds from the Natural Product Atlas database against PD-L1, including three steps of structure-based virtual screening followed by binding free energy to refine the ideal conformation of potent PD-L1 inhibitors. Subsequently, five natural compounds, i.e., Neoenactin B1, Actinofuranone I, Cosmosporin, Ganocapenoid A, and 3-[3-hydroxy-4-(3-methylbut-2-enyl)phenyl]-5-(4-hydroxybenzyl)-4-methyldihydrofuran-2(3H)-one, were collected based on the ADMET (absorption, distribution, metabolism, excretion, and toxicity) profiling and binding free energy (>−60 kcal/mol) for further computational investigation in comparison to co-crystallized ligand, i.e., JQT inhibitor. Based on interaction mapping, explicit 100 ns molecular dynamics simulation, and end-point binding free energy calculations, the selected natural compounds were marked for substantial stability with PD-L1 via intermolecular interactions (hydrogen and hydrophobic) with essential residues in comparison to the JQT inhibitor. Collectively, the calculated results advocate the selected natural compounds as the putative potent inhibitors of PD-L1 and, therefore, can be considered for further development of PD-L1 immune checkpoint inhibitors in cancer immunotherapy.
- Klíčová slova
- Neoenactin B1, immunotherapy, molecular dynamics simulation, natural products, programmed death ligand 1,
- Publikační typ
- časopisecké články MeSH
Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially "badly behaving compounds", "bad actors", or "nuisance compounds". These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on "privileged scaffolds" to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose-response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de , not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.
- MeSH
- farmaceutické databáze MeSH
- knihovny malých molekul chemie MeSH
- léčivé přípravky chemie MeSH
- molekulární modely MeSH
- proteiny chemie MeSH
- ROC křivka MeSH
- rychlé screeningové testy metody MeSH
- strojové učení * MeSH
- vazba proteinů MeSH
- vazebná místa MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- knihovny malých molekul MeSH
- léčivé přípravky MeSH
- proteiny MeSH