JavaScript NENÍ povolen !

Prosím povolte JavaScript.

* Zobrazit nápovědu

Reset

Nejvíce citované: 29086122

2 citací v PubMed Filtry

Nejvíce citovaný článek - PubMed ID 29086122

Nonpher: computational method for design of hard-to-synthesize structures

Journal of cheminformatics. 2017 Mar 20 ; 9 (1) : 20. [epub] 20170320

J Cheminform
ISSN 1758-2946
Zdroj

Článek

Profiling and analysis of chemical compounds using pointwise mutual information

Čmelo, I
Autor Čmelo, I ORCID CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
Voršilák, M
Autor Voršilák, M ORCID CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic CZ-OPENSCREEN National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
Svozil, D
Autor Autorita ORCID CZ-OPENSCREEN National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic. Daniel.Svozil@vscht.cz CZ-OPENSCREEN National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic. Daniel.Svozil@vscht.cz

Journal of cheminformatics. 2021 Jan 10 ; 13 (1) : 3. [epub] 20210110

J Cheminform
ISSN 1758-2946
Zdroj

Pointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound's feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds.

Klíčová slova
Hashed fingerprint, Information theory, Pointwise mutual information, Structural key, Synthetic accessibility,
Publikační typ
časopisecké články MeSH

Článek

SYBA: Bayesian estimation of synthetic accessibility of organic compounds

Journal of cheminformatics. 2020 May 20 ; 12 (1) : 35. [epub] 20200520

J Cheminform
ISSN 1758-2946
Zdroj

SYBA (SYnthetic Bayesian Accessibility) is a fragment-based method for the rapid classification of organic compounds as easy- (ES) or hard-to-synthesize (HS). It is based on a Bernoulli naïve Bayes classifier that is used to assign SYBA score contributions to individual fragments based on their frequencies in the database of ES and HS molecules. SYBA was trained on ES molecules available in the ZINC15 database and on HS molecules generated by the Nonpher methodology. SYBA was compared with a random forest, that was utilized as a baseline method, as well as with other two methods for synthetic accessibility assessment: SAScore and SCScore. When used with their suggested thresholds, SYBA improves over random forest classification, albeit marginally, and outperforms SAScore and SCScore. However, upon the optimization of SAScore threshold (that changes from 6.0 to - 4.5), SAScore yields similar results as SYBA. Because SYBA is based merely on fragment contributions, it can be used for the analysis of the contribution of individual molecular parts to compound synthetic accessibility. SYBA is publicly available at https://github.com/lich-uct/syba under the GNU General Public License.

Klíčová slova
Bayesian analysis, Bernoulli naïve Bayes, Synthetic accessibility,
Publikační typ
časopisecké články MeSH

* Zobrazit nápovědu

Nonpher: computational method for design of hard-to-synthesize structures

Upřesnit dle MeSH