-
Something wrong with this record ?
A deep learning genome-mining strategy for biosynthetic gene cluster prediction
GD. Hannigan, D. Prihoda, A. Palicka, J. Soukup, O. Klempir, L. Rampula, J. Durcak, M. Wurst, J. Kotowski, D. Chang, R. Wang, G. Piizzi, G. Temesi, DJ. Hazuda, CH. Woelk, DA. Bitton,
Language English Country Great Britain
Document type Journal Article, Research Support, Non-U.S. Gov't
NLK
Directory of Open Access Journals
from 2005
Free Medical Journals
from 1996
PubMed Central
from 1974
Europe PubMed Central
from 1974
Open Access Digital Library
from 1996-01-01 to 2030-12-31
Open Access Digital Library
from 1974-01-01
Open Access Digital Library
from 1996-01-01
Open Access Digital Library
from 1996-01-01
Medline Complete (EBSCOhost)
from 1996-01-01
Oxford Journals Open Access Collection
from 1996-01-01
ROAD: Directory of Open Access Scholarly Resources
from 1974
PubMed
31400112
DOI
10.1093/nar/gkz654
Knihovny.cz E-resources
- MeSH
- Biosynthetic Pathways genetics MeSH
- Data Mining methods MeSH
- Deep Learning MeSH
- Genome, Bacterial genetics MeSH
- Genome MeSH
- Multigene Family genetics MeSH
- Computational Biology methods MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
AI and Big Data Analytics MSD Czech Republic s r o Prague Czech Republic
Bioinformatics and Cheminformatics Solutions MSD Czech Republic s r o Prague Czech Republic
Data Science MSD Czech Republic s r o Prague Czech Republic
Exploratory Science Center Merck and Co Inc Cambridge Massachusetts USA
Genetics and Pharmacogenomics Merck and Co Inc Boston MA USA
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc19044594
- 003
- CZ-PrNML
- 005
- 20200113081031.0
- 007
- ta
- 008
- 200109s2019 xxk f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1093/nar/gkz654 $2 doi
- 035 __
- $a (PubMed)31400112
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxk
- 100 1_
- $a Hannigan, Geoffrey D $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
- 245 12
- $a A deep learning genome-mining strategy for biosynthetic gene cluster prediction / $c GD. Hannigan, D. Prihoda, A. Palicka, J. Soukup, O. Klempir, L. Rampula, J. Durcak, M. Wurst, J. Kotowski, D. Chang, R. Wang, G. Piizzi, G. Temesi, DJ. Hazuda, CH. Woelk, DA. Bitton,
- 520 9_
- $a Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
- 650 _2
- $a biosyntetické dráhy $x genetika $7 D053898
- 650 _2
- $a výpočetní biologie $x metody $7 D019295
- 650 _2
- $a data mining $x metody $7 D057225
- 650 _2
- $a deep learning $7 D000077321
- 650 _2
- $a genom $7 D016678
- 650 _2
- $a genom bakteriální $x genetika $7 D016680
- 650 _2
- $a multigenová rodina $x genetika $7 D005810
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 700 1_
- $a Prihoda, David $u Big Data Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic. Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, Czech Republic.
- 700 1_
- $a Palicka, Andrej $u AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Soukup, Jindrich $u Data Science, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Klempir, Ondrej $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Rampula, Lena $u NLP, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Durcak, Jindrich $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Wurst, Michael $u AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Kotowski, Jakub $u AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Chang, Dan $u Genetics & Pharmacogenomics, Merck & Co., Inc., Boston, MA, USA.
- 700 1_
- $a Wang, Rurun $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
- 700 1_
- $a Piizzi, Grazia $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
- 700 1_
- $a Temesi, Gergely $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 700 1_
- $a Hazuda, Daria J $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA. Infectious Diseases and Vaccine Research, MRL, Merck & Co., Inc., West Point, PA, USA.
- 700 1_
- $a Woelk, Christopher H $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
- 700 1_
- $a Bitton, Danny A $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
- 773 0_
- $w MED00003554 $t Nucleic acids research $x 1362-4962 $g Roč. 47, č. 18 (2019), s. e110
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/31400112 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y a $z 0
- 990 __
- $a 20200109 $b ABA008
- 991 __
- $a 20200113081403 $b ABA008
- 999 __
- $a ok $b bmc $g 1482863 $s 1083267
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2019 $b 47 $c 18 $d e110 $e 20191010 $i 1362-4962 $m Nucleic acids research $n Nucleic Acids Res $x MED00003554
- LZP __
- $a Pubmed-20200109