EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities
Jazyk angličtina Země Anglie, Velká Británie Médium print
Typ dokumentu časopisecké články, práce podpořená grantem
PubMed
32392342
PubMed Central
PMC7319543
DOI
10.1093/nar/gkaa372
PII: 5835821
Knihovny.cz E-zdroje
- MeSH
- biokatalýza MeSH
- enzymy chemie metabolismus MeSH
- hydrolasy chemie MeSH
- rozpustnost MeSH
- sekvenční analýza proteinů MeSH
- sekvenční homologie aminokyselin MeSH
- software * MeSH
- stabilita enzymů MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- enzymy MeSH
- haloalkane dehalogenase MeSH Prohlížeč
- hydrolasy MeSH
Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Despite genomic databases growing exponentially, classical biochemical characterization techniques are time-demanding, cost-ineffective and low-throughput. Therefore, computational methods are being developed to explore the unmapped sequence space efficiently. Selection of putative enzymes for biochemical characterization based on rational and robust analysis of all available sequences remains an unsolved problem. To address this challenge, we have developed EnzymeMiner-a web server for automated screening and annotation of diverse family members that enables selection of hits for wet-lab experiments. EnzymeMiner prioritizes sequences that are more likely to preserve the catalytic activity and are heterologously expressible in a soluble form in Escherichia coli. The solubility prediction employs the in-house SoluProt predictor developed using machine learning. EnzymeMiner reduces the time devoted to data gathering, multi-step analysis, sequence prioritization and selection from days to hours. The successful use case for the haloalkane dehalogenase family is described in a comprehensive tutorial available on the EnzymeMiner web page. EnzymeMiner is a universal tool applicable to any enzyme family that provides an interactive and easy-to-use web interface freely available at https://loschmidt.chemi.muni.cz/enzymeminer/.
Zobrazit více v PubMed
Sayers E.W., Agarwala R., Bolton E.E., Brister J.R., Canese K., Clark K., Connor R., Fiorini N., Funk K., Hefferon T. et al. .. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2019; 47:D23–D28. PubMed PMC
UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. PubMed PMC
Colin P.-Y., Kintses B., Gielen F., Miton C.M., Fischer G., Mohamed M.F., Hyvönen M., Morgavi D.P., Janssen D.B., Hollfelder F.. Ultrahigh-throughput discovery of promiscuous enzymes by picodroplet functional metagenomics. Nat. Commun. 2015; 6:1–12. PubMed PMC
Beneyton T., Thomas S., Griffiths A.D., Nicaud J.-M., Drevelle A., Rossignol T.. Droplet-based microfluidic high-throughput screening of heterologous enzymes secreted by the yeast Yarrowia lipolytica. Microb. Cell Fact. 2017; 16:18. PubMed PMC
Vanacek P., Sebestova E., Babkova P., Bidmanova S., Daniel L., Dvorak P., Stepankova V., Chaloupkova R., Brezovsky J., Prokop Z. et al. .. Exploration of enzyme diversity by integrating bioinformatics with expression analysis and biochemical characterization. ACS Catal. 2018; 8:2402–2412.
El-Gebali S., Mistry J., Bateman A., Eddy S.R., Luciani A., Potter S.C., Qureshi M., Richardson L.J., Salazar G.A., Smart A. et al. .. The Pfam protein families database in 2019. Nucleic Acids Res. 2019; 47:D427–D432. PubMed PMC
Li Y., Wang S., Umarov R., Xie B., Fan M., Li L., Gao X.. DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics. 2018; 34:760–769. PubMed PMC
Zhou N., Jiang Y., Bergquist T.R., Lee A.J., Kacsoh B.Z., Crocker A.W., Lewis K.A., Georghiou G., Nguyen H.N., Hamid M.N. et al. .. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 2019; 20:244. PubMed PMC
Mak W.S., Tran S., Marcheschi R., Bertolani S., Thompson J., Baker D., Liao J.C., Siegel J.B.. Integrative genomic mining for enzyme function to enable engineering of a non-natural biosynthetic pathway. Nat. Commun. 2015; 6:1–10. PubMed PMC
Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25:3389–3402. PubMed PMC
Edgar R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010; 26:2460–2461. PubMed
Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J. et al. .. Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011; 7:539. PubMed PMC
Krogh A., Larsson B., von Heijne G., Sonnhammer E.L.. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 2001; 305:567–580. PubMed
Quevillon E., Silventoinen V., Pillai S., Harte N., Mulder N., Apweiler R., Lopez R.. InterProScan: protein domains identifier. Nucleic Acids Res. 2005; 33:W116–W120. PubMed PMC
Federhen S. The NCBI Taxonomy database. Nucleic Acids Res. 2012; 40:D136–D143. PubMed PMC
Barrett T., Clark K., Gevorgyan R., Gorelenkov V., Gribov E., Karsch-Mizrachi I., Kimelman M., Pruitt K.D., Resenchuk S., Tatusova T. et al. .. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012; 40:D57–D63. PubMed PMC
Musil M., Konegger H., Hon J., Bednar D., Damborsky J.. Computational design of Stable and Soluble Biocatalysts. ACS Catal. 2019; 9:1033–1054.
Steinegger M., Söding J.. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017; 35:1026–1028. PubMed
Shannon P. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13:2498–2504. PubMed PMC
Copp J.N., Akiva E., Babbitt P.C., Tokuriki N.. Revealing unexplored sequence-function space using sequence similarity networks. Biochemistry. 2018; 57:4651–4662. PubMed
Gerlt J.A., Bouvier J.T., Davidson D.B., Imker H.J., Sadkhin B., Slater D.R., Whalen K.L.. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta (BBA) - Proteins Proteomics. 2015; 1854:1019–1037. PubMed PMC
Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E.. The Protein Data Bank. Nucleic Acids Res. 2000; 28:235–242. PubMed PMC
Klesmith J.R., Bacik J.-P., Wrenbeck E.E., Michalczyk R., Whitehead T.A.. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl Acad. Sci. U.S.A. 2017; 114:2265–2270. PubMed PMC
Mechanism-Based Design of Efficient PET Hydrolases
SoluProt: prediction of soluble protein expression in Escherichia coli