protein structure prediction
Dotaz
Zobrazit nápovědu
The ability to predict and design protein structures has led to numerous applications in medicine, diagnostics and sustainable chemical manufacture. In addition, the wealth of predicted protein structures has advanced our understanding of how life's molecules function and interact. Honouring the work that has fundamentally changed the way scientists research and engineer proteins, the Nobel Prize in Chemistry in 2024 was awarded to David Baker for computational protein design and jointly to Demis Hassabis and John Jumper, who developed AlphaFold for machine-learning-based protein structure prediction. Here, we highlight notable contributions to the development of these computational tools and their importance for the design of functional proteins that are applied in organic synthesis. Notably, both technologies have the potential to impact drug discovery as any therapeutic protein target can now be modelled, allowing the de novo design of peptide binders and the identification of small molecule ligands through in silico docking of large compound libraries. Looking ahead, we highlight future research directions in protein engineering, medicinal chemistry and material design that are enabled by this transformative shift in protein science.
- Klíčová slova
- AlphaFold, Computational protein design, Nobel prize, Protein engineering, Protein structure prediction,
- MeSH
- biokatalýza MeSH
- konformace proteinů MeSH
- proteinové inženýrství MeSH
- proteiny * chemie metabolismus MeSH
- strojové učení MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- proteiny * MeSH
Proteins are naturally formed by domains edging their functional and structural properties. A domain out of the context of an entire protein can retain its structure and to some extent also function on its own. These properties rationalize construction of artificial fusion multidomain proteins with unique combination of various functions. Information on the specific functional and structural characteristics of individual domains in the context of new artificial fusion proteins is inevitably encoded in sequential order of composing domains defining their mutual spatial positions. So the challenges in designing new proteins with new domain combinations lie dominantly in structure/function prediction and its context dependency. Despite the enormous body of publications on artificial fusion proteins, the task of their structure/function prediction is complex and nontrivial. The degree of spatial freedom facilitated by a linker between domains and their mutual orientation driven by noncovalent interactions is beyond a simple and straightforward methodology to predict their structure with reasonable accuracy. In the presented manuscript, we tested methodology using available modeling tools and computational methods. We show that the process and methodology of such prediction are not straightforward and must be done with care even when recently introduced AlphaFold II is used. We also addressed a question of benchmarking standards for prediction of multidomain protein structures-x-ray or Nuclear Magnetic Resonance experiments. On the study of six two-domain protein chimeras as well as their composing domains and their x-ray structures selected from PDB, we conclude that the major obstacle for justified prediction is inappropriate sampling of the conformational space by the explored methods. On the other hands, we can still address particular steps of the methodology and improve the process of chimera proteins prediction.
- Klíčová slova
- 3D structure prediction, fusion proteins, molecular simulations, x-ray crystallography,
- MeSH
- proteinové domény MeSH
- proteiny * chemie MeSH
- rekombinantní fúzní proteiny * chemie MeSH
- rentgenové záření MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- proteiny * MeSH
- rekombinantní fúzní proteiny * MeSH
BACKGROUND: Ligand binding site prediction from protein structure has many applications related to elucidation of protein function and structure based drug discovery. It often represents only one step of many in complex computational drug design efforts. Although many methods have been published to date, only few of them are suitable for use in automated pipelines or for processing large datasets. These use cases require stability and speed, which disqualifies many of the recently introduced tools that are either template based or available only as web servers. RESULTS: We present P2Rank, a stand-alone template-free tool for prediction of ligand binding sites based on machine learning. It is based on prediction of ligandability of local chemical neighbourhoods that are centered on points placed on the solvent accessible surface of a protein. We show that P2Rank outperforms several existing tools, which include two widely used stand-alone tools (Fpocket, SiteHound), a comprehensive consensus based tool (MetaPocket 2.0), and a recent deep learning based method (DeepSite). P2Rank belongs to the fastest available tools (requires under 1 s for prediction on one protein), with additional advantage of multi-threaded implementation. CONCLUSIONS: P2Rank is a new open source software package for ligand binding site prediction from protein structure. It is available as a user-friendly stand-alone command line program and a Java library. P2Rank has a lightweight installation and does not depend on other bioinformatics tools or large structural or sequence databases. Thanks to its speed and ability to make fully automated predictions, it is particularly well suited for processing large datasets or as a component of scalable structural bioinformatics pipelines.
- Klíčová slova
- Binding site prediction, Ligand binding sites, Machine learning, Protein pockets, Protein surface descriptors, Random forests,
- Publikační typ
- časopisecké články MeSH
This article briefly describes our program Jamsek written in FORTRAN for an ICL 2950/10 computer. Jamsek combines statistical and stereochemical rules most frequently encountered in literature to predict protein secondary structure from its sequence, into a single algorithm. The composite algorithm does not work better than the best existing single algorithms of Garnier et al. (J. Mol. Biol., 120, 97-120, 1978) or Lim (J. Mol. Biol., 88, 873-894, 1974) if percentage of residues with a correctly predicted secondary structure is taken as a criterion. However, it is fairly reliable in predicting the total amount of alpha-helices and beta-sheets in proteins, the secondary structure of highly ordered proteins or their parts and identification of long alpha-helices. It surpasses the previous algorithms by providing a possibility to make a notion about confidence of the prediction of the particular secondary structure elements thanks to the simultaneous availability of four independent predictions of the secondary structure and other relevant data (hydrophobic profile and helical wheel representation). The main body of this article is devoted to a demonstration that output data of Jamsek can simply be used for the prediction of protein topological class, identification of globular proteins containing hydrophobic alpha-helices and, as an auxiliary means, to distinguish between protein coding and non-coding nucleotide sequences.
- MeSH
- algoritmy MeSH
- chemické jevy MeSH
- chemie MeSH
- konformace proteinů * MeSH
- matematické výpočty počítačové * MeSH
- proteiny klasifikace MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- proteiny MeSH
BACKGROUND: Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. RESULTS: We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. CONCLUSION: In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
- Klíčová slova
- Data mining, Molecular fingerprints, Prediction, Protein-protein interaction,
- MeSH
- aminokyseliny * chemie metabolismus MeSH
- databáze proteinů MeSH
- mapování interakce mezi proteiny metody MeSH
- proteiny * chemie metabolismus MeSH
- software * MeSH
- statistické modely MeSH
- výpočetní biologie MeSH
- znalostní báze * MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- aminokyseliny * MeSH
- proteiny * MeSH
MOTIVATION: Predicting protein-ligand binding sites is crucial in studying protein interactions with applications in biotechnology and drug discovery. Two distinct paradigms have emerged for this purpose: sequence-based methods, which leverage protein sequence information, and structure-based methods, which rely on the three-dimensional (3D) structure of the protein. Here, we analyze a hybrid approach that combines the strengths of both paradigms by integrating two recent deep learning architectures: protein language models (pLMs) from the sequence-based paradigm and Graph Neural Networks (GNNs) from the structure-based paradigm. Specifically, we construct a residue-level Graph Attention Network (GAT) model based on the protein's 3D structure that uses pre-trained pLM embeddings as node features. This integration enables us to study the interplay between the sequential information encoded in the protein sequence and the spatial relationships within the protein structure on the model performance. RESULTS: By exploiting a benchmark dataset over a range of ligands and ligand types, we have shown that using the structure information consistently enhances the predictive power of the baselines in absolute terms. Nevertheless, as more complex pLMs are used to represent node features, the relative impact of the structure information represented by the GNN architecture diminishes. The above observations suggest that although the use of the experimental protein structure almost always improves the accuracy of the prediction of the binding site, complex pLMs still contain structural information that leads to good predictive performance even without the use of 3D structure. AVAILABILITY: The datasets generated and/or analyzed during the current study, as well as pretrained models are available in the following Zenodo link https://zenodo.org/records/15184302. The source code that was used to generate the results of the current study is available in the following GitHub repository https://github.com/hamzagamouh/pt-lm-gnn as well as in the following Zenodo link https://zenodo.org/records/15192327. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Journal online.
- Publikační typ
- časopisecké články MeSH
BACKGROUND: PsbO, the manganese-stabilising protein, is an indispensable extrinsic subunit of photosystem II. It plays a crucial role in the stabilisation of the water-splitting Mn4CaO5 cluster, which catalyses the oxidation of water to molecular oxygen by using light energy. PsbO was also demonstrated to have a weak GTPase activity that could be involved in regulation of D1 protein turnover. Our analysis of psbO sequences showed that many angiosperm species express two psbO paralogs, but the pairs of isoforms in one species were not orthologous to pairs of isoforms in distant species. RESULTS: Phylogenetic analysis of 91 psbO sequences from 49 land plant species revealed that psbO duplication occurred many times independently, generally at the roots of modern angiosperm families. In spite of this, the level of isoform divergence was similar in different species. Moreover, mapping of the differences on the protein tertiary structure showed that the isoforms in individual species differ from each other on similar positions, mostly on the luminally exposed end of the β-barrel structure. Comparison of these differences with the location of differences between PsbOs from diverse angiosperm families indicated various selection pressures in PsbO evolution and potential interaction surfaces on the PsbO structure. CONCLUSIONS: The analyses suggest that similar subfunctionalisation of PsbO isoforms occurred parallelly in various lineages. We speculate that the presence of two PsbO isoforms helps the plants to finely adjust the photosynthetic apparatus in response to variable conditions. This might be mediated by diverse GTPase activity, since the isoform differences predominate near the predicted GTP-binding site.
- MeSH
- aminokyseliny metabolismus MeSH
- druhová specificita MeSH
- fotosystém II (proteinový komplex) chemie metabolismus MeSH
- fylogeneze * MeSH
- Magnoliopsida genetika metabolismus MeSH
- molekulární modely MeSH
- otevřené čtecí rámce genetika MeSH
- protein - isoformy chemie metabolismus MeSH
- rostlinné geny MeSH
- sekundární struktura proteinů MeSH
- sekvence aminokyselin MeSH
- substituce aminokyselin MeSH
- terciární struktura proteinů MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- aminokyseliny MeSH
- fotosystém II (proteinový komplex) MeSH
- photosystem II manganese-stabilizing protein MeSH Prohlížeč
- protein - isoformy MeSH
SUMMARY: We present the cpPredictor webserver that implements a novel template-based method for prediction of secondary structure of RNA. The method outperforms available prediction methods as it uses RNA structures of related molecules, either predicted or experimentally identified, as structural templates. The server aims at three major tasks: i) prediction of RNA secondary structures that are difficult to predict by available methods, ii) characterization of uncharacterized RNAs as compatible or incompatible with a chosen template structure and iii) an identification of the most relevant structure among different candidate structures of a single RNA ambiguously predicted by available methods. The web server is accompanied with a comprehensive documentation. AVAILABILITY AND IMPLEMENTATION: The web server is freely available at http://cppredictor.elixir-czech.cz/. The source code of the cpPredictor algorithm is freely available from the webserver under the Apache License, Version 2.0.
- MeSH
- algoritmy MeSH
- internet MeSH
- konformace nukleové kyseliny * MeSH
- RNA MeSH
- sekundární struktura proteinů MeSH
- sekvenční analýza RNA MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- RNA MeSH
In proteins, immunogenic determinants that can induce protein-reactive antipeptide antibodies reside mostly in those parts of the molecule that have a high tendency to form beta-turns. A program for an IBM personal computer which predicts protein immunogenic determinants is described. The program predicts potential immunogenic determinants from protein amino acid sequences according to a Chou-Fasman-based probability of a beta-turn occurrence, p greater than 1.5 X 10(-4)(P. Y. Chou and G. D. Fasman, 1978, Adv. Enzymol. 47, 46-148). Oncopeptides (whose efficacy in generating protein-reactive antipeptide antibodies has been described) with a beta-turn probability of p greater than 1.5 X 10(-4) elicited antipeptide antibodies that reacted with the parent oncoprotein at a rate of 96%, thus showing a surprisingly good correlation between the tendency to form a beta-turn and the protein reactivity of antipeptide antibodies. Potential immunogenic determinants were predicted on myohemerythrin and myoglobin.
- MeSH
- epitopy analýza MeSH
- hemerythrin analogy a deriváty analýza MeSH
- myoglobin analýza MeSH
- počítačová simulace MeSH
- proteiny analýza imunologie MeSH
- sekvence aminokyselin MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- epitopy MeSH
- hemerythrin MeSH
- myoglobin MeSH
- myohemerythrin MeSH Prohlížeč
- proteiny MeSH
The TP53 gene is the most frequently mutated gene in human cancer and p53 protein plays a crucial role in gene expression and cancer protection. Its role is manifested by interactions with other proteins and DNA. p53 is a transcription factor that binds to DNA response elements (REs). Due to the palindromic nature of the consensus binding site, several p53-REs have the potential to form cruciform structures. However, the influence of cruciform formation on the activity of p53-REs has not been evaluated. Therefore, we prepared sets of p53-REs with identical theoretical binding affinity in their linear state, but different probabilities to form extra helical structures, for in vitro and in vivo analyses. Then we evaluated the presence of cruciform structures when inserted into plasmid DNA and employed a yeast-based assay to measure transactivation potential of these p53-REs cloned at a chromosomal locus in isogenic strains. We show that transactivation in vivo correlated more with relative propensity of an RE to form cruciforms than to its predicted in vitro DNA binding affinity for wild type p53. Structural features of p53-REs could therefore be an important determinant of p53 transactivation function.
- Klíčová slova
- Cruciform structure, Inverted repeat, Protein-DNA interaction, p53 protein,
- MeSH
- aktivace transkripce MeSH
- chromatin genetika MeSH
- kvasinky genetika MeSH
- mutace MeSH
- nádorový supresorový protein p53 chemie genetika metabolismus MeSH
- obrácené repetice * MeSH
- počítačová simulace MeSH
- responzivní elementy * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Názvy látek
- chromatin MeSH
- nádorový supresorový protein p53 MeSH