EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities

. 2020 Jul 02 ; 48 (W1) : W104-W109.

Jazyk angličtina Země Anglie, Velká Británie Médium print

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/pmid32392342

Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Despite genomic databases growing exponentially, classical biochemical characterization techniques are time-demanding, cost-ineffective and low-throughput. Therefore, computational methods are being developed to explore the unmapped sequence space efficiently. Selection of putative enzymes for biochemical characterization based on rational and robust analysis of all available sequences remains an unsolved problem. To address this challenge, we have developed EnzymeMiner-a web server for automated screening and annotation of diverse family members that enables selection of hits for wet-lab experiments. EnzymeMiner prioritizes sequences that are more likely to preserve the catalytic activity and are heterologously expressible in a soluble form in Escherichia coli. The solubility prediction employs the in-house SoluProt predictor developed using machine learning. EnzymeMiner reduces the time devoted to data gathering, multi-step analysis, sequence prioritization and selection from days to hours. The successful use case for the haloalkane dehalogenase family is described in a comprehensive tutorial available on the EnzymeMiner web page. EnzymeMiner is a universal tool applicable to any enzyme family that provides an interactive and easy-to-use web interface freely available at https://loschmidt.chemi.muni.cz/enzymeminer/.

Zobrazit více v PubMed

Sayers E.W., Agarwala R., Bolton E.E., Brister J.R., Canese K., Clark K., Connor R., Fiorini N., Funk K., Hefferon T. et al. .. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2019; 47:D23–D28. PubMed PMC

UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. PubMed PMC

Colin P.-Y., Kintses B., Gielen F., Miton C.M., Fischer G., Mohamed M.F., Hyvönen M., Morgavi D.P., Janssen D.B., Hollfelder F.. Ultrahigh-throughput discovery of promiscuous enzymes by picodroplet functional metagenomics. Nat. Commun. 2015; 6:1–12. PubMed PMC

Beneyton T., Thomas S., Griffiths A.D., Nicaud J.-M., Drevelle A., Rossignol T.. Droplet-based microfluidic high-throughput screening of heterologous enzymes secreted by the yeast Yarrowia lipolytica. Microb. Cell Fact. 2017; 16:18. PubMed PMC

Vanacek P., Sebestova E., Babkova P., Bidmanova S., Daniel L., Dvorak P., Stepankova V., Chaloupkova R., Brezovsky J., Prokop Z. et al. .. Exploration of enzyme diversity by integrating bioinformatics with expression analysis and biochemical characterization. ACS Catal. 2018; 8:2402–2412.

El-Gebali S., Mistry J., Bateman A., Eddy S.R., Luciani A., Potter S.C., Qureshi M., Richardson L.J., Salazar G.A., Smart A. et al. .. The Pfam protein families database in 2019. Nucleic Acids Res. 2019; 47:D427–D432. PubMed PMC

Li Y., Wang S., Umarov R., Xie B., Fan M., Li L., Gao X.. DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics. 2018; 34:760–769. PubMed PMC

Zhou N., Jiang Y., Bergquist T.R., Lee A.J., Kacsoh B.Z., Crocker A.W., Lewis K.A., Georghiou G., Nguyen H.N., Hamid M.N. et al. .. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 2019; 20:244. PubMed PMC

Mak W.S., Tran S., Marcheschi R., Bertolani S., Thompson J., Baker D., Liao J.C., Siegel J.B.. Integrative genomic mining for enzyme function to enable engineering of a non-natural biosynthetic pathway. Nat. Commun. 2015; 6:1–10. PubMed PMC

Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25:3389–3402. PubMed PMC

Edgar R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010; 26:2460–2461. PubMed

Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J. et al. .. Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011; 7:539. PubMed PMC

Krogh A., Larsson B., von Heijne G., Sonnhammer E.L.. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 2001; 305:567–580. PubMed

Quevillon E., Silventoinen V., Pillai S., Harte N., Mulder N., Apweiler R., Lopez R.. InterProScan: protein domains identifier. Nucleic Acids Res. 2005; 33:W116–W120. PubMed PMC

Federhen S. The NCBI Taxonomy database. Nucleic Acids Res. 2012; 40:D136–D143. PubMed PMC

Barrett T., Clark K., Gevorgyan R., Gorelenkov V., Gribov E., Karsch-Mizrachi I., Kimelman M., Pruitt K.D., Resenchuk S., Tatusova T. et al. .. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012; 40:D57–D63. PubMed PMC

Musil M., Konegger H., Hon J., Bednar D., Damborsky J.. Computational design of Stable and Soluble Biocatalysts. ACS Catal. 2019; 9:1033–1054.

Steinegger M., Söding J.. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017; 35:1026–1028. PubMed

Shannon P. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13:2498–2504. PubMed PMC

Copp J.N., Akiva E., Babbitt P.C., Tokuriki N.. Revealing unexplored sequence-function space using sequence similarity networks. Biochemistry. 2018; 57:4651–4662. PubMed

Gerlt J.A., Bouvier J.T., Davidson D.B., Imker H.J., Sadkhin B., Slater D.R., Whalen K.L.. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta (BBA) - Proteins Proteomics. 2015; 1854:1019–1037. PubMed PMC

Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E.. The Protein Data Bank. Nucleic Acids Res. 2000; 28:235–242. PubMed PMC

Klesmith J.R., Bacik J.-P., Wrenbeck E.E., Michalczyk R., Whitehead T.A.. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl Acad. Sci. U.S.A. 2017; 114:2265–2270. PubMed PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...