A deep learning genome-mining strategy for biosynthetic gene cluster prediction

. 2019 Oct 10 ; 47 (18) : e110.

Jazyk angličtina Země Velká Británie, Anglie Médium print

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid31400112

Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.

Zobrazit více v PubMed

Newman D.J., Cragg G.M.. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 2012; 75:311–335. PubMed PMC

Milshteyn A., Schneider J.S., Brady S.F.. Mining the metabiome: identifying novel natural products from microbial communities. Chem. Biol. 2014; 21:1211–1223. PubMed PMC

Ventola C.L. The antibiotic resistance crisis: part 1: causes and threats. P T. 2015; 40:277–283. PubMed PMC

Pendleton J.N., Gorman S.P., Gilmore B.F.. Clinical relevance of the ESKAPE pathogens. Expert Rev. Anti. Infect. Ther. 2013; 11:297–308. PubMed

Zhang H., Chen J.. Current status and future directions of cancer immunotherapy. J. Cancer. 2018; 9:1773–1781. PubMed PMC

Shen B. A new golden age of natural products drug discovery. Cell. 2015; 163:1297–1300. PubMed PMC

DeCorte B.L. Underexplored opportunities for natural products in drug discovery. J. Med. Chem. 2016; 59:9295–9304. PubMed

Harvey A.L., Edrada-Ebel R., Quinn R.J.. The re-emergence of natural products for drug discovery in the genomics era. Nat. Rev. Drug Discov. 2015; 14:111–129. PubMed

Hopwood D.A., Merrick M.J.. Genetics of antibiotic production. Bacteriol. Rev. 1977; 41:595–635. PubMed PMC

Martin J.F. Clusters of genes for the biosynthesis of antibiotics: regulatory genes and overproduction of pharmaceuticals. J. Ind. Microbiol. 1992; 9:73–90. PubMed

Martín M.F., Liras P.. Organization and expression of genes involved in the biosynthesis of antibiotics and other secondary metabolites. Annu. Rev. Microbiol. 1989; 43:173–206. PubMed

Medema M.H., Fischbach M.A.. Computational approaches to natural product discovery. Nat. Chem. Biol. 2015; 11:639–648. PubMed PMC

Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.. Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410. PubMed

Medema M.H., Blin K., Cimermancic P., de Jager V., Zakrzewski P., Fischbach M.A., Weber T., Takano E., Breitling R.. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011; 39:W339–W346. PubMed PMC

Weber T., Rausch C., Lopez P., Hoof I., Gaykova V., Huson D.H., Wohlleben W.. CLUSEAN: a computer-based framework for the automated analysis of bacterial secondary metabolite biosynthetic gene clusters. J. Biotechnol. 2009; 140:13–17. PubMed

Cimermancic P., Medema M.H., Claesen J., Kurita K., Wieland Brown L.C., Mavrommatis K., Pati A., Godfrey P.A., Koehrsen M., Clardy J. et al. .. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014; 158:412–421. PubMed PMC

Eddy S.R. Profile hidden Markov models. Bioinformatics. 1998; 14:755–763. PubMed

Skinnider M.A., Merwin N.J., Johnston C.W., Magarvey N.A.. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes. Nucleic Acids Res. 2017; 45:W49–W54. PubMed PMC

Yoon B.-J. Hidden Markov models and their applications in biological sequence analysis. Curr. Genomics. 2009; 10:402–415. PubMed PMC

Choo K.H., Tong J.C., Zhang L.. Recent applications of Hidden Markov Models in computational biology. Genomics. Proteomics Bioinformatics. 2004; 2:84–96. PubMed PMC

Eddy S.R. What is a hidden Markov model. Nat. Biotechnol. 2004; 22:1315–1316. PubMed

Finn R.D., Coggill P., Eberhardt R.Y., Eddy S.R., Mistry J., Mitchell A.L., Potter S.C., Punta M., Qureshi M., Sangrador-Vegas A. et al. .. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016; 44:D279–D285. PubMed PMC

Hochreiter S., Heusel M., Obermayer K.. Fast model-based protein homology detection without alignment. Bioinformatics. 2007; 23:1728–1736. PubMed

Hochreiter S., Schmidhuber J.. Long Short-Term memory. Neural Comput. 1997; 9:1735–1780. PubMed

Schuster M., Paliwal K.K.. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997; 45:2673–2681.

O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D. et al. .. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. PubMed PMC

Hyatt D., Chen G.-L., LoCascio P.F., Land M.L., Larimer F.W., Hauser L.J.. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010; 11:119. PubMed PMC

Mikolov T., Chen K., Corrado G., Dean J.. 2013; Efficient Estimation of Word Representations in Vector Space.

Medema M.H., Kottmann R., Yilmaz P., Cummings M., Biggins J.B., Blin K., de Bruijn I., Chooi Y.H., Claesen J., Coates R.C. et al. .. Minimum information about a biosynthetic gene cluster. Nat. Chem. Biol. 2015; 11:625–631. PubMed PMC

Ziemert N., Alanjary M., Weber T.. The evolution of genome mining in microbes - a review. Nat. Prod. Rep. 2016; 33:988–1005. PubMed

Chavadi S.S., Stirrett K.L., Edupuganti U.R., Vergnolle O., Sadhanandan G., Marchiano E., Martin C., Qiu W.-G., Soll C.E., Quadri L.E.N.. Mutational and phylogenetic analyses of the mycobacterial mbt gene cluster. J. Bacteriol. 2011; 193:5905–5913. PubMed PMC

Quadri L.E., Sello J., Keating T.A., Weinreb P.H., Walsh C.T.. Identification of a Mycobacterium tuberculosis gene cluster encoding the biosynthetic enzymes for assembly of the virulence-conferring siderophore mycobactin. Chem. Biol. 1998; 5:631–645. PubMed

Li W., He J., Xie L., Chen T., Xie J.. Comparative genomic insights into the biosynthesis and regulation of mycobacterial siderophores. Cell Physiol. Biochem. 2013; 31:1–13. PubMed

Harris N.C., Sato M., Herman N.A., Twigg F., Cai W., Liu J., Zhu X., Downey J., Khalaf R., Martin J. et al. .. Biosynthesis of isonitrile lipopeptides by conserved nonribosomal peptide synthetase gene clusters in Actinobacteria. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:7025–7030. PubMed PMC

Tobias N.J., Doig K.D., Medema M.H., Chen H., Haring V., Moore R., Seemann T., Stinear T.P.. Complete genome sequence of the frog pathogen Mycobacterium ulcerans ecovar Liflandii. J. Bacteriol. 2013; 195:556–564. PubMed PMC

Armstrong R.N. Mechanistic diversity in a metalloenzyme superfamily. Biochemistry. 2000; 39:13625–13632. PubMed

Anantharaman V., Aravind L.. New connections in the prokaryotic toxin-antitoxin network: relationship with the eukaryotic nonsense-mediated RNA decay system. Genome Biol. 2003; 4:R81. PubMed PMC

LeCun Y., Bengio Y., Hinton G.. Deep learning. Nature. 2015; 521:436–444. PubMed

Asgari E., Mofrad M.R.K.. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One. 2015; 10:e0141287. PubMed PMC

Kim S., Lee H., Kim K., Kang J.. Mut2Vec: distributed representation of cancerous mutations. BMC Med. Genomics. 2018; 11:33. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...