JavaScript is NOT enabled !

Please enable JavaScript.

Article

Medvik - BMC

Something wrong with this record ?

Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters

C. Stork, Y. Chen, M. Šícho, J. Kirchmair,

Stork, Conrad
Author Stork, Conrad Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany
Chen, Ya
Author Chen, Ya Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany
Šícho, Martin
Author Šícho, Martin Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany. CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology , University of Chemistry and Technology Prague , 166 28 Prague 6 , Czech Republic
Kirchmair, Johannes
Author Kirchmair, Johannes Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany. Department of Chemistry , University of Bergen , N-5020 Bergen , Norway. Computational Biology Unit (CBU) , University of Bergen , N-5020 Bergen , Norway

Journal of chemical information and modeling. 2019 ; 59 (3) : 1030-1043. [pub] 20190125

J Chem Inf Model
ISSN 1549-960X
Medvik
Source

Language English Country United States

Document type Journal Article, Research Support, Non-U.S. Gov't

Persistent link https://www.medvik.cz/link/bmc20006659

PubMed 30624935
DOI 10.1021/acs.jcim.8b00677
Knihovny.cz E-resources

MeSH
Databases, Pharmaceutical MeSH
Small Molecule Libraries chemistry MeSH
Pharmaceutical Preparations chemistry MeSH
Models, Molecular MeSH
Proteins chemistry MeSH
ROC Curve MeSH
High-Throughput Screening Assays methods MeSH
Machine Learning * MeSH
Protein Binding MeSH
Binding Sites MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially "badly behaving compounds", "bad actors", or "nuisance compounds". These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on "privileged scaffolds" to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose-response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de , not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.

Center for Bioinformatics Department of Computer Science Faculty of Mathematics Informatics and Natural Sciences Universität Hamburg Hamburg 20146 Germany

Center for Bioinformatics Department of Computer Science Faculty of Mathematics Informatics and Natural Sciences Universität Hamburg Hamburg 20146 Germany CZ OPENSCREEN National Infrastructure for Chemical Biology Laboratory of Informatics and Chemistry Faculty of Chemical Technology University of Chemistry and Technology Prague 166 28 Prague 6 Czech Republic

Center for Bioinformatics University of Bergen N 5020 Bergen Norway

References provided by Crossref.org

000: 00000naa a2200000 a 4500

001: bmc20006659

003: CZ-PrNML

005: 20200526083403.0

007: ta

008: 200511s2019 xxu f 000 0|eng||

009: AR

024 7_: $a 10.1021/acs.jcim.8b00677 $2 doi

035 __: $a (PubMed)30624935

040 __: $a ABA008 $b cze $d ABA008 $e AACR2

041 0_: $a eng

044 __: $a xxu

100 1_: $a Stork, Conrad $u Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany.

245 10: $a Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters / $c C. Stork, Y. Chen, M. Šícho, J. Kirchmair,

520 9_: $a Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially "badly behaving compounds", "bad actors", or "nuisance compounds". These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on "privileged scaffolds" to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose-response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de , not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.

650 _2: $a vazebná místa $7 D001665

650 _2: $a farmaceutické databáze $7 D062313

650 _2: $a rychlé screeningové testy $x metody $7 D057166

650 12: $a strojové učení $7 D000069550

650 _2: $a molekulární modely $7 D008958

650 _2: $a léčivé přípravky $x chemie $7 D004364

650 _2: $a vazba proteinů $7 D011485

650 _2: $a proteiny $x chemie $7 D011506

650 _2: $a ROC křivka $7 D012372

650 _2: $a knihovny malých molekul $x chemie $7 D054852

655 _2: $a časopisecké články $7 D016428

655 _2: $a práce podpořená grantem $7 D013485

700 1_: $a Chen, Ya $u Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany.

700 1_: $a Šícho, Martin $u Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany. CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology , University of Chemistry and Technology Prague , 166 28 Prague 6 , Czech Republic.

700 1_: $a Kirchmair, Johannes $u Center for Bioinformatics (ZBH), Department of Computer Science , Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg , Hamburg , 20146 , Germany. Department of Chemistry , University of Bergen , N-5020 Bergen , Norway. Computational Biology Unit (CBU) , University of Bergen , N-5020 Bergen , Norway.

773 0_: $w MED00008945 $t Journal of chemical information and modeling $x 1549-960X $g Roč. 59, č. 3 (2019), s. 1030-1043

856 41: $u https://pubmed.ncbi.nlm.nih.gov/30624935 $y Pubmed

910 __: $a ABA008 $b sig $c sign $y a $z 0

990 __: $a 20200511 $b ABA008

991 __: $a 20200526083359 $b ABA008

999 __: $a ok $b bmc $g 1525517 $s 1096715

BAS __: $a 3

BAS __: $a PreBMC

BMC __: $a 2019 $b 59 $c 3 $d 1030-1043 $e 20190125 $i 1549-960X $m Journal of chemical information and modeling $n J Chem Inf Model $x MED00008945

LZP __: $a Pubmed-20200511

Borrow
RIS

Find record

In PubMed

Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters

Find record

Citation metrics

Archiving options