JavaScript is NOT enabled !

Please enable JavaScript.

* Show help

Reset

Most cited: 33416864

4 citations in PubMed Filters

Most cited article - PubMed ID 33416864

SoluProt: prediction of soluble protein expression in Escherichia coli

Bioinformatics (Oxford, England). 2021 Apr 09 ; 37 (1) : 23-28.

Bioinformatics
ISSN 1367-4811 | 1367-4803
Source

Article

Engineering Dehalogenase Enzymes Using Variational Autoencoder-Generated Latent Spaces and Microfluidics

Kohout, Pavel
Author Kohout, Pavel Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Vasina, Michal
Author Vasina, Michal Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Majerova, Marika
Author Majerova, Marika Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Novakova, Veronika
Author Novakova, Veronika Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Damborsky, Jiri
Author Damborsky, Jiri ORCID Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Bednar, David
Author Bednar, David Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Marek, Martin
Author Marek, Martin ORCID Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Prokop, Zbynek
Author Prokop, Zbynek ORCID Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic
Mazurenko, Stanislav
Author Mazurenko, Stanislav ORCID Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Brno 611 37, Czech Republic International Clinical Research Centre, St. Anne's Hospital, Brno 656 91, Czech Republic

JACS Au. 2025 Feb 24 ; 5 (2) : 838-850. [epub] 20250213

ISSN 2691-3704
Source

Enzymes play a crucial role in sustainable industrial applications, with their optimization posing a formidable challenge due to the intricate interplay among residues. Computational methodologies predominantly rely on evolutionary insights of homologous sequences. However, deciphering the evolutionary variability and complex dependencies among residues presents substantial hurdles. Here, we present a new machine-learning method based on variational autoencoders and evolutionary sampling strategy to address those limitations. We customized our method to generate novel sequences of model enzymes, haloalkane dehalogenases. Three design-build-test cycles improved the solubility of variants from 11% to 75%. Thorough experimental validation including the microfluidic device MicroPEX resulted in 20 multiple-point variants. Nine of them, sharing as little as 67% sequence similarity with the template, showed a melting temperature increase of up to 9 °C and an average improvement of 3 °C. The most stable variant demonstrated a 3.5-fold increase in activity compared to the template. High-quality experimental data collected with 20 variants represent a valuable data set for the critical validation of novel protein design approaches. Python scripts, jupyter notebooks, and data sets are available on GitHub (https://github.com/loschmidt/vae-dehalogenases), and interactive calculations will be possible via https://loschmidt.chemi.muni.cz/fireprotasr/.

Publication type
Journal Article MeSH

Article

GASP: A Pan-Specific Predictor of Family 1 Glycosyltransferase Acceptor Specificity Enabled by a Pipeline for Substrate Feature Generation and Large-Scale Experimental Screening

ACS omega. 2024 Jun 25 ; 9 (25) : 27278-27288. [epub] 20240611

ACS Omega
ISSN 2470-1343
Source

Glycosylation represents a major chemical challenge; while it is one of the most common reactions in Nature, conventional chemistry struggles with stereochemistry, regioselectivity, and solubility issues. In contrast, family 1 glycosyltransferase (GT1) enzymes can glycosylate virtually any given nucleophilic group with perfect control over stereochemistry and regioselectivity. However, the appropriate catalyst for a given reaction needs to be identified among the tens of thousands of available sequences. Here, we present the glycosyltransferase acceptor specificity predictor (GASP) model, a data-driven approach to the identification of reactive GT1:acceptor pairs. We trained a random forest-based acceptor predictor on literature data and validated it on independent in-house generated data on 1001 GT1:acceptor pairs, obtaining an AUROC of 0.79 and a balanced accuracy of 72%. The performance was stable even in the case of completely new GT1s and acceptors not present in the training data set, highlighting the pan-specificity of GASP. Moreover, the model is capable of parsing all known GT1 sequences, as well as all chemicals, the latter through a pipeline for the generation of 153 chemical features for a given molecule taking the CID or SMILES as input (freely available at https://github.com/degnbol/GASP). To investigate the power of GASP, the model prediction probability scores were compared to GT1 substrate conversion yields from a newly published data set, with the top 50% of GASP predictions corresponding to reactions with >50% synthetic yields. The model was also tested in two comparative case studies: glycosylation of the antihelminth drug niclosamide and the plant defensive compound DIBOA. In the first study, the model achieved an 83% hit rate, outperforming a hit rate of 53% from a random selection assay. In the second case study, the hit rate of GASP was 50%, and while being lower than the hit rate of 83% using expert-selected enzymes, it provides a reasonable performance for the cases when an expert opinion is unavailable. The hierarchal importance of the generated chemical features was investigated by negative feature selection, revealing properties related to cyclization and atom hybridization status to be the most important characteristics for accurate prediction. Our study provides a GT1:acceptor predictor which can be trained on other data sets enabled by the automated feature generation pipelines. We also release the new in-house generated data set used for testing of GASP to facilitate the future development of GT1 activity predictors and their robust benchmarking.

Publication type
Journal Article MeSH

Article

Machine Learning-Guided Protein Engineering

ACS catalysis. 2023 Nov 03 ; 13 (21) : 13863-13895. [epub] 20231013

ACS Catal
ISSN 2155-5435
Source

Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.

Publication type
Journal Article MeSH
Review MeSH

Article

The nascent polypeptide-associated complex (NAC) controls translation initiation in cis by recruiting nucleolin to the encoding mRNA

Nucleic acids research. 2022 Sep 23 ; 50 (17) : 10110-10122.

Nucleic Acids Res
ISSN 1362-4962 | 0305-1048
Source

Protein aggregates and abnormal proteins are toxic and associated with neurodegenerative diseases. There are several mechanisms to help cells get rid of aggregates but little is known on how cells prevent aggregate-prone proteins from being synthesised. The EBNA1 of the Epstein-Barr virus (EBV) evades the immune system by suppressing its own mRNA translation initiation in order to minimize the production of antigenic peptides for the major histocompatibility (MHC) class I pathway. Here we show that the emerging peptide of the disordered glycine-alanine repeat (GAr) within EBNA1 dislodges the nascent polypeptide-associated complex (NAC) from the ribosome. This results in the recruitment of nucleolin to the GAr-encoding mRNA and suppression of mRNA translation initiation in cis. Suppressing NAC alpha (NACA) expression prevents nucleolin from binding to the GAr mRNA and overcomes GAr-mediated translation inhibition. Taken together, these observations suggest that EBNA1 exploits a nascent protein quality control pathway to regulate its own rate of synthesis that is based on sensing the nascent GAr peptide by NAC followed by the recruitment of nucleolin to the GAr-encoding RNA sequence.

MeSH
Alanine MeSH
Phosphoproteins MeSH
Glycine MeSH
Epstein-Barr Virus Infections * MeSH
Humans MeSH
RNA, Messenger genetics metabolism MeSH
Nucleolin MeSH
Peptides genetics MeSH
Protein Aggregates MeSH
RNA-Binding Proteins metabolism MeSH
Epstein-Barr Virus Nuclear Antigens metabolism MeSH
Herpesvirus 4, Human * genetics MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Names of Substances
Alanine MeSH
Phosphoproteins MeSH
Glycine MeSH
RNA, Messenger MeSH
Peptides MeSH
Protein Aggregates MeSH
RNA-Binding Proteins MeSH
Epstein-Barr Virus Nuclear Antigens MeSH

* Show help

SoluProt: prediction of soluble protein expression in Escherichia coli

Refine by MeSH