Currently, there is very little research aimed at developing medical knowledge extraction tools for major West Slavic languages (Czech, Polish, and Slovak). This project lays the groundwork for a general medical knowledge extraction pipeline, introducing the resource vocabularies available for the respective languages (UMLS resources, ICD-10 translations and national drug databases). It demonstrates the utility of this approach on a case study using a large proprietary corpus of Czech oncology records consisting of more than 40 million words written about more than 4,000 patients. After correlating MedDRA terms found in patients' records with drugs prescribed to them, significant non-obvious associations were found between selected medical conditions being mentioned and the probability of certain drugs being prescribed over the course of the patient's treatment, in some cases increasing the probability of prescriptions by over 250%. This direction of research, producing large amounts of annotated data, is a prerequisite for training deep learning models and predictive systems.
- MeSH
- Databases, Pharmaceutical * MeSH
- Language * MeSH
- Medical Oncology MeSH
- Humans MeSH
- International Classification of Diseases MeSH
- Knowledge MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- MeSH
- Databases, Pharmaceutical legislation & jurisprudence MeSH
- Pharmaceutical Preparations * supply & distribution MeSH
- Humans MeSH
- Legislation, Drug MeSH
- Check Tag
- Humans MeSH
- Publication type
- Tables MeSH
- Geographicals
- Czech Republic MeSH
In this work we present the third generation of FAst MEtabolizer (FAME 3), a collection of extra trees classifiers for the prediction of sites of metabolism (SoMs) in small molecules such as drugs, druglike compounds, natural products, agrochemicals, and cosmetics. FAME 3 was derived from the MetaQSAR database ( Pedretti et al. J. Med. Chem. 2018 , 61 , 1019 ), a recently published data resource on xenobiotic metabolism that contains more than 2100 substrates annotated with more than 6300 experimentally confirmed SoMs related to redox reactions, hydrolysis and other nonredox reactions, and conjugation reactions. In tests with holdout data, FAME 3 models reached competitive performance, with Matthews correlation coefficients (MCCs) ranging from 0.50 for a global model covering phase 1 and phase 2 metabolism, to 0.75 for a focused model for phase 2 metabolism. A model focused on cytochrome P450 metabolism yielded an MCC of 0.57. Results from case studies with several synthetic compounds, natural products, and natural product derivatives demonstrate the agreement between model predictions and literature data even for molecules with structural patterns clearly distinct from those present in the training data. The applicability domains of the individual models were estimated by a new, atom-based distance measure (FAMEscore) that is based on a nearest-neighbor search in the space of atom environments. FAME 3 is available via a public web service at https://nerdd.zbh.uni-hamburg.de/ and as a self-contained Java software package, free for academic and noncommercial research.
Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially "badly behaving compounds", "bad actors", or "nuisance compounds". These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on "privileged scaffolds" to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose-response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de , not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.
- MeSH
- Databases, Pharmaceutical MeSH
- Small Molecule Libraries chemistry MeSH
- Pharmaceutical Preparations chemistry MeSH
- Models, Molecular MeSH
- Proteins chemistry MeSH
- ROC Curve MeSH
- High-Throughput Screening Assays methods MeSH
- Machine Learning * MeSH
- Protein Binding MeSH
- Binding Sites MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Alzheimer's disease (AD) is one of the most significant neurodegenerative disorders and its symptoms mostly appear in aged people. Catechol-o-methyltransferase (COMT) is one of the known target enzymes responsible for AD. With the use of 23 known inhibitors of COMT, a query has been generated and validated by screening against the database of 1500 decoys to obtain the GH score and enrichment value. The crucial features of the known inhibitors were evaluated by the online ZINC Pharmer to identify new leads from a ZINC database. Five hundred hits were retrieved from ZINC Pharmer and by ADMET (absorption, distribution, metabolism, excretion, and toxicity) filtering by using FAF-Drug-3 and 36 molecules were considered for molecular docking. From the COMT inhibitors, opicapone, fenoldopam, and quercetin were selected, while ZINC63625100_413 ZINC39411941_412, ZINC63234426_254, ZINC63637968_451, and ZINC64019452_303 were chosen for the molecular dynamics simulation analysis having high binding affinity and structural recognition. This study identified the potential COMT inhibitors through pharmacophore-based inhibitor screening leading to a more complete understanding of molecular-level interactions.
- MeSH
- Alzheimer Disease drug therapy enzymology physiopathology MeSH
- Gene Expression MeSH
- Databases, Pharmaceutical MeSH
- Catechol O-Methyltransferase Inhibitors chemistry pharmacology MeSH
- Protein Interaction Domains and Motifs MeSH
- Catechol O-Methyltransferase chemistry MeSH
- Kinetics MeSH
- Protein Conformation, alpha-Helical MeSH
- Protein Conformation, beta-Strand MeSH
- Humans MeSH
- Ligands MeSH
- Nootropic Agents chemistry pharmacology MeSH
- High-Throughput Screening Assays * MeSH
- Molecular Dynamics Simulation MeSH
- Molecular Docking Simulation MeSH
- Substrate Specificity MeSH
- Protein Structure, Tertiary MeSH
- Thermodynamics MeSH
- Protein Binding MeSH
- Binding Sites MeSH
- Structure-Activity Relationship MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- MeSH
- Databases, Pharmaceutical MeSH
- Food-Drug Interactions MeSH
- Contraindications, Drug MeSH
- Drug Interactions * MeSH
- Drug Prescriptions standards MeSH
- Drug-Related Side Effects and Adverse Reactions epidemiology etiology MeSH
- Polypharmacy statistics & numerical data MeSH
- Publication type
- Newspaper Article MeSH
- Interview MeSH
- MeSH
- Data Mining methods utilization MeSH
- Databases as Topic MeSH
- European Union MeSH
- Databases, Pharmaceutical * standards utilization legislation & jurisprudence MeSH
- Pharmacovigilance * MeSH
- Integrated Advanced Information Management Systems organization & administration utilization legislation & jurisprudence MeSH
- Humans MeSH
- MEDLARS utilization MeSH
- PubMed utilization MeSH
- Practice Guidelines as Topic MeSH
- Adverse Drug Reaction Reporting Systems * organization & administration trends legislation & jurisprudence MeSH
- Government Agencies organization & administration utilization legislation & jurisprudence MeSH
- Check Tag
- Humans MeSH
- Geographicals
- Czech Republic MeSH
Světová databázová centra se svými databázemi jsou často pro běžného uživatele hůře dostupná ne z důvodu jejich komerčního charakteru, ale že o nich uživatel nemá povědomí. Přehled databázových center, která mají ve své nabídce lékařské nebo farmaceutické databáze, často unikátní a specializované. Role lékařské knihovny ve zprostředkování přístupu k těmto databázím.
The world database vendors with their databases are not known usually for typical users of scientific information, because the centers are based on fee, and also there are other organisational barriers. The brief overview of database vendors, which offers medical or pharmaceutical databases, often unique and specialized. The role of medical libraries in the brokerage of access. Search and reference services.