JavaScript is NOT enabled !

Please enable JavaScript.

* Show help

Reset

MeSH: Databases, Protein

85 hits in Articles Filters

Article

Multiplexing methods in dynamic protein crystallography

Klureza, Margaret A
Author Klureza, Margaret A Institute for Nanostructure and Solid State Physics, University of Hamburg, HARBOR, Hamburg, Germany
Pulnova, Yelyzaveta
Author Pulnova, Yelyzaveta ELIbeamlines, Extreme Light Infrastructure, Dolni Brezany, Czechia
von Stetten, David
Author von Stetten, David European Molecular Biology, Laboratory (EMBL), Hamburg, Germany
Owen, Robin L
Author Owen, Robin L Diamond Light Source Ltd, Harwell Science and Innovation, Campus, Didcot, Oxfordshire, United Kingdom
Beddard, Godfrey S
Author Beddard, Godfrey S School of Chemistry, University of Edinburgh, David Brewster Road, United Kingdom School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, United Kingdom
Pearson, Arwen R
Author Pearson, Arwen R Institute for Nanostructure and Solid State Physics, University of Hamburg, HARBOR, Hamburg, Germany
Yorke, Briony A
Author Yorke, Briony A School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, United Kingdom. Electronic address: B.A.Yorke@leeds.ac.uk

Methods in enzymology. 2024 ; 709 (-) : 177-206. [pub] 20241024

Methods Enzymol
ISSN 1557-7988
Medvik
Source

Time-resolved X-ray crystallography experiments were first performed in the 1980s, yet they remained a niche technique for decades. With the recent advent of X-ray free electron laser (XFEL) sources and serial crystallographic techniques, time-resolved crystallography has received renewed interest and has become more accessible to a wider user base. Despite this, time-resolved structures represent < 1 % of models deposited in the world-wide Protein Data Bank, indicating that the tools and techniques currently available require further development before such experiments can become truly routine. In this chapter, we demonstrate how applying data multiplexing to time-resolved crystallography can enhance the achievable time resolution at moderately intense monochromatic X-ray sources, ranging from synchrotrons to bench-top sources. We discuss the principles of multiplexing, where this technique may be advantageous, potential pitfalls, and experimental design considerations.

MeSH
Databases, Protein MeSH
Protein Conformation MeSH
Crystallography, X-Ray methods MeSH
Models, Molecular MeSH
Proteins * chemistry MeSH
Synchrotrons MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Research Support, U.S. Gov't, Non-P.H.S. MeSH

Online article

Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature

Scientific data. 2024 ; 11 (1) : 1032. [pub] 20240927

Sci Data
ISSN 2052-4463
Medvik
Source

We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.

Online article

Genomics 2 Proteins portal: a resource and discovery tool for linking genetic screening outputs to protein sequences and structures

Nature methods. 2024 ; 21 (10) : 1947-1957. [pub] 20240918

Nat Methods
ISSN 1548-7105
Medvik
Source

Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics have generated genetic variants at an unprecedented scale. However, efficient tools and resources are needed to link disparate data types-to 'map' variants onto protein structures, to better understand how the variation causes disease, and thereby design therapeutics. Here we present the Genomics 2 Proteins portal ( https://g2p.broadinstitute.org/ ): a human proteome-wide resource that maps 20,076,998 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the Genomics 2 Proteins portal allows users to interactively upload protein residue-wise annotations (for example, variants and scores) as well as the protein structure beyond databases to establish the connection between genomics to proteins. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotypes.

Online article

Understanding bacterial pathogen diversity: A proteogenomic analysis and use of an array of genome assemblies to identify novel virulence factors of the honey bee bacterial pathogen Paenibacillus larvae

Proteomics. 2024 ; 24 (14) : e2300280. [pub] 20240514

ISSN 1615-9861
Medvik
Source

Mass spectrometry proteomics data are typically evaluated against publicly available annotated sequences, but the proteogenomics approach is a useful alternative. A single genome is commonly utilized in custom proteomic and proteogenomic data analysis. We pose the question of whether utilizing numerous different genome assemblies in a search database would be beneficial. We reanalyzed raw data from the exoprotein fraction of four reference Enterobacterial Repetitive Intergenic Consensus (ERIC) I-IV genotypes of the honey bee bacterial pathogen Paenibacillus larvae and evaluated them against three reference databases (from NCBI-protein, RefSeq, and UniProt) together with an array of protein sequences generated by six-frame direct translation of 15 genome assemblies from GenBank. The wide search yielded 453 protein hits/groups, which UpSet analysis categorized into 50 groups based on the success of protein identification by the 18 database components. Nine hits that were not identified by a unique peptide were not considered for marker selection, which discarded the only protein that was not identified by the reference databases. We propose that the variability in successful identifications between genome assemblies is useful for marker mining. The results suggest that various strains of P. larvae can exhibit specific traits that set them apart from the established genotypes ERIC I-V.

MeSH
Bacterial Proteins * genetics metabolism MeSH
Databases, Protein MeSH
Virulence Factors * genetics metabolism MeSH
Genome, Bacterial * genetics MeSH
Paenibacillus larvae * genetics pathogenicity metabolism MeSH
Proteogenomics * methods MeSH
Proteomics methods MeSH
Bees microbiology MeSH
Animals MeSH
Check Tag
Animals MeSH
Publication type
Journal Article MeSH

Article

AHoJ-DB: A PDB-wide Assignment of apo & holo Relationships Based on Individual Protein-Ligand Interactions

Journal of molecular biology. 2024 ; 436 (17) : 168545. [pub] 20240318

J Mol Biol
ISSN 1089-8638
Medvik
Source

A single protein structure is rarely sufficient to capture the conformational variability of a protein. Both bound and unbound (holo and apo) forms of a protein are essential for understanding its geometry and making meaningful comparisons. Nevertheless, docking or drug design studies often still consider only single protein structures in their holo form, which are for the most part rigid. With the recent explosion in the field of structural biology, large, curated datasets are urgently needed. Here, we use a previously developed application (AHoJ) to perform a comprehensive search for apo-holo pairs for 468,293 biologically relevant protein-ligand interactions across 27,983 proteins. In each search, the binding pocket is captured and mapped across existing structures within the same UniProt, and the mapped pockets are annotated as apo or holo, based on the presence or absence of ligands. We assemble the results into a database, AHoJ-DB (www.apoholo.cz/db), that captures the variability of proteins with identical sequences, thereby exposing the agents responsible for the observed differences in geometry. We report several metrics for each annotated pocket, and we also include binding pockets that form at the interface of multiple chains. Analysis of the database shows that about 24% of the binding sites occur at the interface of two or more chains and that less than 50% of the total binding sites processed have an apo form in the PDB. These results can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal protein- and ligand-specific relationships that were previously obscured by intermittent or partial data. Availability: www.apoholo.cz/db.

Online article

Úspěchy projektu Alphafold v oblasti predikce prostorových struktur proteinů

Bioprospect. 2022 ; 32 (1-2) : 3-4.

ISSN 1210-1737
Medvik
Source

Online article

PDBe and PDBe-KB: Providing high-quality, up-to-date and integrated resources of macromolecular structures to support basic and applied research and education

Protein science. 2022 ; 31 (10) : e4439. [pub] -

Protein Sci
ISSN 1469-896X
Medvik
Source

The archiving and dissemination of protein and nucleic acid structures as well as their structural, functional and biophysical annotations is an essential task that enables the broader scientific community to conduct impactful research in multiple fields of the life sciences. The Protein Data Bank in Europe (PDBe; pdbe.org) team develops and maintains several databases and web services to address this fundamental need. From data archiving as a member of the Worldwide PDB consortium (wwPDB; wwpdb.org), to the PDBe Knowledge Base (PDBe-KB; pdbekb.org), we provide data, data-access mechanisms, and visualizations that facilitate basic and applied research and education across the life sciences. Here, we provide an overview of the structural data and annotations that we integrate and make freely available. We describe the web services and data visualization tools we offer, and provide information on how to effectively use or even further develop them. Finally, we discuss the direction of our data services, and how we aim to tackle new challenges that arise from the recent, unprecedented advances in the field of structure determination and protein structure modeling.

Online article

Protein Binder (ProBi) as a New Class of Structurally Robust Non-Antibody Protein Scaffold for Directed Evolution

Viruses. 2021 ; 13 (2) : . [pub] 20210127

ISSN 1999-4915
Medvik
Source

Engineered small non-antibody protein scaffolds are a promising alternative to antibodies and are especially attractive for use in protein therapeutics and diagnostics. The advantages include smaller size and a more robust, single-domain structural framework with a defined binding surface amenable to mutation. This calls for a more systematic approach in designing new scaffolds suitable for use in one or more methods of directed evolution. We hereby describe a process based on an analysis of protein structures from the Protein Data Bank and their experimental examination. The candidate protein scaffolds were subjected to a thorough screening including computational evaluation of the mutability, and experimental determination of their expression yield in E. coli, solubility, and thermostability. In the next step, we examined several variants of the candidate scaffolds including their wild types and alanine mutants. We proved the applicability of this systematic procedure by selecting a monomeric single-domain human protein with a fold different from previously known scaffolds. The newly developed scaffold, called ProBi (Protein Binder), contains two independently mutable surface patches. We demonstrated its functionality by training it as a binder against human interleukin-10, a medically important cytokine. The procedure yielded scaffold-related variants with nanomolar affinity.

MeSH
Databases, Protein MeSH
Interleukin-10 metabolism MeSH
Protein Conformation MeSH
Computer Simulation MeSH
Protein Engineering MeSH
Proteins chemistry genetics metabolism MeSH
Recombinant Proteins chemistry genetics metabolism MeSH
Ribosomes metabolism MeSH
Directed Molecular Evolution methods MeSH
Amino Acid Sequence MeSH
Protein Stability MeSH
Protein Binding MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Online article

CATH: increased structural coverage of functional space

Nucleic acids research. 2021 ; 49 (D1) : D266-D273. [pub] 20210108

Nucleic Acids Res
ISSN 1362-4962
Medvik
Source

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.

MeSH
Molecular Sequence Annotation MeSH
COVID-19 epidemiology prevention & control virology MeSH
Databases, Protein statistics & numerical data MeSH
Epidemics MeSH
Internet MeSH
Humans MeSH
Protein Domains * MeSH
Proteins chemistry genetics metabolism MeSH
SARS-CoV-2 genetics metabolism physiology MeSH
Amino Acid Sequence MeSH
Sequence Analysis, Protein methods MeSH
Sequence Homology, Amino Acid MeSH
Viral Proteins chemistry genetics metabolism MeSH
Computational Biology methods statistics & numerical data MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Online article

FireProtDB: database of manually curated protein stability data

Nucleic acids research. 2021 ; 49 (D1) : D319-D324. [pub] 20210108

Nucleic Acids Res
ISSN 1362-4962
Medvik
Source

The majority of naturally occurring proteins have evolved to function under mild conditions inside the living organisms. One of the critical obstacles for the use of proteins in biotechnological applications is their insufficient stability at elevated temperatures or in the presence of salts. Since experimental screening for stabilizing mutations is typically laborious and expensive, in silico predictors are often used for narrowing down the mutational landscape. The recent advances in machine learning and artificial intelligence further facilitate the development of such computational tools. However, the accuracy of these predictors strongly depends on the quality and amount of data used for training and testing, which have often been reported as the current bottleneck of the approach. To address this problem, we present a novel database of experimental thermostability data for single-point mutants FireProtDB. The database combines the published datasets, data extracted manually from the recent literature, and the data collected in our laboratory. Its user interface is designed to facilitate both types of the expected use: (i) the interactive explorations of individual entries on the level of a protein or mutation and (ii) the construction of highly customized and machine learning-friendly datasets using advanced searching and filtering. The database is freely available at https://loschmidt.chemi.muni.cz/fireprotdb.

MeSH
Molecular Sequence Annotation MeSH
Point Mutation * MeSH
Databases, Protein * MeSH
Datasets as Topic MeSH
Internet MeSH
Models, Molecular MeSH
Proteins chemistry genetics MeSH
Software MeSH
Protein Stability MeSH
Machine Learning statistics & numerical data MeSH
Computational Biology methods MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH

Collections

Published

Filters

* Show help

* Show help

Refine by MeSH