• Something wrong with this record ?

A deep learning genome-mining strategy for biosynthetic gene cluster prediction

GD. Hannigan, D. Prihoda, A. Palicka, J. Soukup, O. Klempir, L. Rampula, J. Durcak, M. Wurst, J. Kotowski, D. Chang, R. Wang, G. Piizzi, G. Temesi, DJ. Hazuda, CH. Woelk, DA. Bitton,

. 2019 ; 47 (18) : e110. [pub] 20191010

Language English Country Great Britain

Document type Journal Article, Research Support, Non-U.S. Gov't

Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc19044594
003      
CZ-PrNML
005      
20200113081031.0
007      
ta
008      
200109s2019 xxk f 000 0|eng||
009      
AR
024    7_
$a 10.1093/nar/gkz654 $2 doi
035    __
$a (PubMed)31400112
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxk
100    1_
$a Hannigan, Geoffrey D $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
245    12
$a A deep learning genome-mining strategy for biosynthetic gene cluster prediction / $c GD. Hannigan, D. Prihoda, A. Palicka, J. Soukup, O. Klempir, L. Rampula, J. Durcak, M. Wurst, J. Kotowski, D. Chang, R. Wang, G. Piizzi, G. Temesi, DJ. Hazuda, CH. Woelk, DA. Bitton,
520    9_
$a Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
650    _2
$a biosyntetické dráhy $x genetika $7 D053898
650    _2
$a výpočetní biologie $x metody $7 D019295
650    _2
$a data mining $x metody $7 D057225
650    _2
$a deep learning $7 D000077321
650    _2
$a genom $7 D016678
650    _2
$a genom bakteriální $x genetika $7 D016680
650    _2
$a multigenová rodina $x genetika $7 D005810
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Prihoda, David $u Big Data Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic. Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology, Prague, Czech Republic.
700    1_
$a Palicka, Andrej $u AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Soukup, Jindrich $u Data Science, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Klempir, Ondrej $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Rampula, Lena $u NLP, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Durcak, Jindrich $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Wurst, Michael $u AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Kotowski, Jakub $u AI & Big Data Analytics, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Chang, Dan $u Genetics & Pharmacogenomics, Merck & Co., Inc., Boston, MA, USA.
700    1_
$a Wang, Rurun $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
700    1_
$a Piizzi, Grazia $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
700    1_
$a Temesi, Gergely $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
700    1_
$a Hazuda, Daria J $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA. Infectious Diseases and Vaccine Research, MRL, Merck & Co., Inc., West Point, PA, USA.
700    1_
$a Woelk, Christopher H $u Exploratory Science Center, Merck & Co., Inc., Cambridge, Massachusetts, USA.
700    1_
$a Bitton, Danny A $u Bioinformatics & Cheminformatics Solutions, MSD Czech Republic s.r.o., Prague, Czech Republic.
773    0_
$w MED00003554 $t Nucleic acids research $x 1362-4962 $g Roč. 47, č. 18 (2019), s. e110
856    41
$u https://pubmed.ncbi.nlm.nih.gov/31400112 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20200109 $b ABA008
991    __
$a 20200113081403 $b ABA008
999    __
$a ok $b bmc $g 1482863 $s 1083267
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2019 $b 47 $c 18 $d e110 $e 20191010 $i 1362-4962 $m Nucleic acids research $n Nucleic Acids Res $x MED00003554
LZP    __
$a Pubmed-20200109

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...