Nejvíce citovaný článek - PubMed ID 29796670
HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information
Enzymes play a crucial role in sustainable industrial applications, with their optimization posing a formidable challenge due to the intricate interplay among residues. Computational methodologies predominantly rely on evolutionary insights of homologous sequences. However, deciphering the evolutionary variability and complex dependencies among residues presents substantial hurdles. Here, we present a new machine-learning method based on variational autoencoders and evolutionary sampling strategy to address those limitations. We customized our method to generate novel sequences of model enzymes, haloalkane dehalogenases. Three design-build-test cycles improved the solubility of variants from 11% to 75%. Thorough experimental validation including the microfluidic device MicroPEX resulted in 20 multiple-point variants. Nine of them, sharing as little as 67% sequence similarity with the template, showed a melting temperature increase of up to 9 °C and an average improvement of 3 °C. The most stable variant demonstrated a 3.5-fold increase in activity compared to the template. High-quality experimental data collected with 20 variants represent a valuable data set for the critical validation of novel protein design approaches. Python scripts, jupyter notebooks, and data sets are available on GitHub (https://github.com/loschmidt/vae-dehalogenases), and interactive calculations will be possible via https://loschmidt.chemi.muni.cz/fireprotasr/.
- Publikační typ
- časopisecké články MeSH
Every year, more than 19 million cancer cases are diagnosed, and this number continues to increase annually. Since standard treatment options have varying success rates for different types of cancer, understanding the biology of an individual's tumour becomes crucial, especially for cases that are difficult to treat. Personalised high-throughput profiling, using next-generation sequencing, allows for a comprehensive examination of biopsy specimens. Furthermore, the widespread use of this technology has generated a wealth of information on cancer-specific gene alterations. However, there exists a significant gap between identified alterations and their proven impact on protein function. Here, we present a bioinformatics pipeline that enables fast analysis of a missense mutation's effect on stability and function in known oncogenic proteins. This pipeline is coupled with a predictor that summarises the outputs of different tools used throughout the pipeline, providing a single probability score, achieving a balanced accuracy above 86%. The pipeline incorporates a virtual screening method to suggest potential FDA/EMA-approved drugs to be considered for treatment. We showcase three case studies to demonstrate the timely utility of this pipeline. To facilitate access and analysis of cancer-related mutations, we have packaged the pipeline as a web server, which is freely available at https://loschmidt.chemi.muni.cz/predictonco/ .Scientific contributionThis work presents a novel bioinformatics pipeline that integrates multiple computational tools to predict the effects of missense mutations on proteins of oncological interest. The pipeline uniquely combines fast protein modelling, stability prediction, and evolutionary analysis with virtual drug screening, while offering actionable insights for precision oncology. This comprehensive approach surpasses existing tools by automating the interpretation of mutations and suggesting potential treatments, thereby striving to bridge the gap between sequencing data and clinical application.
- Klíčová slova
- Bioinformatics, Cancer, Function, High-performance computing, Machine learning, Molecular modelling, Oncology, Personalised medicine, Single nucleotide polymorphism, Stability, Treatment,
- Publikační typ
- časopisecké články MeSH
NanoLuc, a superior β-barrel fold luciferase, was engineered 10 years ago but the nature of its catalysis remains puzzling. Here experimental and computational techniques are combined, revealing that imidazopyrazinone luciferins bind to an intra-barrel catalytic site but also to an allosteric site shaped on the enzyme surface. Structurally, binding to the allosteric site prevents simultaneous binding to the catalytic site, and vice versa, through concerted conformational changes. We demonstrate that restructuration of the allosteric site can boost the luminescent reaction in the remote active site. Mechanistically, an intra-barrel arginine coordinates the imidazopyrazinone component of luciferin, which reacts with O2 via a radical charge-transfer mechanism, and then it also protonates the resulting excited amide product to form a light-emitting neutral species. Concomitantly, an aspartate, supported by two tyrosines, fine-tunes the blue color emitter to secure a high emission intensity. This information is critical to engineering the next-generation of ultrasensitive bioluminescent reporters.
- MeSH
- katalytická doména MeSH
- luciferasy metabolismus MeSH
- luminiscenční měření * MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- luciferasy MeSH
- nanoluc MeSH Prohlížeč
PredictONCO 1.0 is a unique web server that analyzes effects of mutations on proteins frequently altered in various cancer types. The server can assess the impact of mutations on the protein sequential and structural properties and apply a virtual screening to identify potential inhibitors that could be used as a highly individualized therapeutic approach, possibly based on the drug repurposing. PredictONCO integrates predictive algorithms and state-of-the-art computational tools combined with information from established databases. The user interface was carefully designed for the target specialists in precision oncology, molecular pathology, clinical genetics and clinical sciences. The tool summarizes the effect of the mutation on protein stability and function and currently covers 44 common oncological targets. The binding affinities of Food and Drug Administration/ European Medicines Agency -approved drugs with the wild-type and mutant proteins are calculated to facilitate treatment decisions. The reliability of predictions was confirmed against 108 clinically validated mutations. The server provides a fast and compact output, ideal for the often time-sensitive decision-making process in oncology. Three use cases of missense mutations, (i) K22A in cyclin-dependent kinase 4 identified in melanoma, (ii) E1197K mutation in anaplastic lymphoma kinase 4 identified in lung carcinoma and (iii) V765A mutation in epidermal growth factor receptor in a patient with congenital mismatch repair deficiency highlight how the tool can increase levels of confidence regarding the pathogenicity of the variants and identify the most effective inhibitors. The server is available at https://loschmidt.chemi.muni.cz/predictonco.
- Klíčová slova
- cancer, oncology, personalized medicine, single-nucleotide polymorphism, targeted therapy,
- MeSH
- individualizovaná medicína * MeSH
- lidé MeSH
- melanom * MeSH
- mutace MeSH
- proteiny MeSH
- reprodukovatelnost výsledků MeSH
- strojové učení MeSH
- výpočetní biologie MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny MeSH
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
- Publikační typ
- časopisecké články MeSH
- přehledy MeSH
Thermostability is an essential requirement for the use of enzymes in the bioindustry. Here, we compare different protein stabilization strategies using a challenging target, a stable haloalkane dehalogenase DhaA115. We observe better performance of automated stabilization platforms FireProt and PROSS in designing multiple-point mutations over the introduction of disulfide bonds and strengthening the intra- and the inter-domain contacts by in silico saturation mutagenesis. We reveal that the performance of automated stabilization platforms was still compromised due to the introduction of some destabilizing mutations. Notably, we show that their prediction accuracy can be improved by applying manual curation or machine learning for the removal of potentially destabilizing mutations, yielding highly stable haloalkane dehalogenases with enhanced catalytic properties. A comparison of crystallographic structures revealed that current stabilization rounds were not accompanied by large backbone re-arrangements previously observed during the engineering stability of DhaA115. Stabilization was achieved by improving local contacts including protein-water interactions. Our study provides guidance for further improvement of automated structure-based computational tools for protein stabilization.
- Publikační typ
- časopisecké články MeSH
Haloalkane dehalogenase (HLD) enzymes employ an SN 2 nucleophilic substitution mechanism to erase halogen substituents in diverse organohalogen compounds. Subfamily I and II HLDs are well-characterized enzymes, but the mode and purpose of multimerization of subfamily III HLDs are unknown. Here we probe the structural organization of DhmeA, a subfamily III HLD-like enzyme from the archaeon Haloferax mediterranei, by combining cryo-electron microscopy (cryo-EM) and x-ray crystallography. We show that full-length wild-type DhmeA forms diverse quaternary structures, ranging from small oligomers to large supramolecular ring-like assemblies of various sizes and symmetries. We optimized sample preparation steps, enabling three-dimensional reconstructions of an oligomeric species by single-particle cryo-EM. Moreover, we engineered a crystallizable mutant (DhmeAΔGG ) that provided diffraction-quality crystals. The 3.3 Å crystal structure reveals that DhmeAΔGG forms a ring-like 20-mer structure with outer and inner diameter of ~200 and ~80 Å, respectively. An enzyme homodimer represents a basic repeating building unit of the crystallographic ring. Three assembly interfaces (dimerization, tetramerization, and multimerization) were identified to form the supramolecular ring that displays a negatively charged exterior, while its interior part harboring catalytic sites is positively charged. Localization and exposure of catalytic machineries suggest a possible processing of large negatively charged macromolecular substrates.
- Klíčová slova
- DhmeA, Haloferax mediterranei, catalysis, cryo-EM, haloalkane dehalogenase, multimerization, x-ray crystallography,
- MeSH
- elektronová kryomikroskopie metody MeSH
- hydrolasy * chemie MeSH
- krystalografie rentgenová MeSH
- substrátová specifita MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- haloalkane dehalogenase MeSH Prohlížeč
- hydrolasy * MeSH
Protein solubility is an attractive engineering target primarily due to its relation to yields in protein production and manufacturing. Moreover, better knowledge of the mutational effects on protein solubility could connect several serious human diseases with protein aggregation. However, we have limited understanding of the protein structural determinants of solubility, and the available data have mostly been scattered in the literature. Here, we present SoluProtMutDB - the first database containing data on protein solubility changes upon mutations. Our database accommodates 33 000 measurements of 17 000 protein variants in 103 different proteins. The database can serve as an essential source of information for the researchers designing improved protein variants or those developing machine learning tools to predict the effects of mutations on solubility. The database comprises all the previously published solubility datasets and thousands of new data points from recent publications, including deep mutational scanning experiments. Moreover, it features many available experimental conditions known to affect protein solubility. The datasets have been manually curated with substantial corrections, improving suitability for machine learning applications. The database is available at loschmidt.chemi.muni.cz/soluprotmutdb.
- Klíčová slova
- Machine learning, Mutational database, Protein aggregation, Protein engineering, Protein yield, Soluble expression,
- Publikační typ
- časopisecké články MeSH
The wide variety of protein structures and functions results from the diverse properties of the 20 canonical amino acids. The generally accepted hypothesis is that early protein evolution was associated with enrichment of a primordial alphabet, thereby enabling increased protein catalytic efficiencies and functional diversification. Aromatic amino acids were likely among the last additions to genetic code. The main objective of this study was to test whether enzyme catalysis can occur without the aromatic residues (aromatics) by studying the structure and function of dephospho-CoA kinase (DPCK) following aromatic residue depletion. We designed two variants of a putative DPCK from Aquifex aeolicus by substituting (a) Tyr, Phe and Trp or (b) all aromatics (including His). Their structural characterization indicates that substituting the aromatics does not markedly alter their secondary structures but does significantly loosen their side chain packing and increase their sizes. Both variants still possess ATPase activity, although with 150-300 times lower efficiency in comparison with the wild-type phosphotransferase activity. The transfer of the phosphate group to the dephospho-CoA substrate becomes heavily uncoupled and only the His-containing variant is still able to perform the phosphotransferase reaction. These data support the hypothesis that proteins in the early stages of life could support catalytic activities, albeit with low efficiencies. An observed significant contraction upon ligand binding is likely important for appropriate organization of the active site. Formation of firm hydrophobic cores, which enable the assembly of stably structured active sites, is suggested to provide a selective advantage for adding the aromatic residues.
- Klíčová slova
- aromatic amino acids, catalysis evolution, genetic code evolution, protein disorder, protein structure evolution,
- MeSH
- Aquifex enzymologie genetika MeSH
- bakteriální proteiny chemie genetika MeSH
- fosfotransferasy s alkoholovou skupinou jako akceptorem chemie genetika MeSH
- katalytická doména MeSH
- katalýza MeSH
- mutageneze cílená MeSH
- sekundární struktura proteinů MeSH
- substituce aminokyselin MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- bakteriální proteiny MeSH
- dephospho-CoA kinase MeSH Prohlížeč
- fosfotransferasy s alkoholovou skupinou jako akceptorem MeSH
The majority of naturally occurring proteins have evolved to function under mild conditions inside the living organisms. One of the critical obstacles for the use of proteins in biotechnological applications is their insufficient stability at elevated temperatures or in the presence of salts. Since experimental screening for stabilizing mutations is typically laborious and expensive, in silico predictors are often used for narrowing down the mutational landscape. The recent advances in machine learning and artificial intelligence further facilitate the development of such computational tools. However, the accuracy of these predictors strongly depends on the quality and amount of data used for training and testing, which have often been reported as the current bottleneck of the approach. To address this problem, we present a novel database of experimental thermostability data for single-point mutants FireProtDB. The database combines the published datasets, data extracted manually from the recent literature, and the data collected in our laboratory. Its user interface is designed to facilitate both types of the expected use: (i) the interactive explorations of individual entries on the level of a protein or mutation and (ii) the construction of highly customized and machine learning-friendly datasets using advanced searching and filtering. The database is freely available at https://loschmidt.chemi.muni.cz/fireprotdb.
- MeSH
- anotace sekvence MeSH
- bodová mutace * MeSH
- databáze proteinů * MeSH
- datové soubory jako téma MeSH
- internet MeSH
- molekulární modely MeSH
- proteiny chemie genetika MeSH
- software MeSH
- stabilita proteinů MeSH
- strojové učení statistika a číselné údaje MeSH
- výpočetní biologie metody MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny MeSH