JavaScript NENÍ povolen !

Prosím povolte JavaScript.

* Zobrazit nápovědu

Reset

Autor: Gáborová, Romana

4 záznamů v PubMed Filtry

Článek

Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature

Vollmar, Melanie
Autor Vollmar, Melanie ORCID Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. melaniev@ebi.ac.uk
Tirunagari, Santosh
Autor Tirunagari, Santosh Literature Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Harrus, Deborah
Autor Harrus, Deborah Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Armstrong, David
Autor Armstrong, David ORCID Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Gáborová, Romana
Autor Gáborová, Romana CEITEC - Central European Institute of Technology, Masaryk University, Kamenice 5, 62500, Brno, Czech Republic
Gupta, Deepti
Autor Gupta, Deepti Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Afonso, Marcelo Querino Lima
Autor Afonso, Marcelo Querino Lima Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Evans, Genevieve
Autor Evans, Genevieve Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Velankar, Sameer
Autor Velankar, Sameer ORCID Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

Scientific data. 2024 Sep 27 ; 11 (1) : 1032. [epub] 20240927

Sci Data
ISSN 2052-4463
Zdroj

We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.

Článek

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Scientific data. 2023 Dec 01 ; 10 (1) : 853. [epub] 20231201

Sci Data
ISSN 2052-4463
Zdroj

Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating and modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository for experimentally determined structures of macromolecules. Structural data in the PDB offer valuable insights into the dynamics, conformation, and functional states of biological assemblies. However, the current annotation practices lack standardised naming conventions for assemblies in the PDB, complicating the identification of instances representing the same assembly. In this study, we introduce a method leveraging resources external to PDB, such as the Complex Portal, UniProt and Gene Ontology, to describe assemblies and contextualise them within their biological settings accurately. Employing the proposed approach, we assigned standard names to over 90% of unique assemblies in the PDB and provided persistent identifiers for each assembly. This standardisation of assembly data enhances the PDB, facilitating a deeper understanding of macromolecular complexes. Furthermore, the data standardisation improves the PDB's FAIR attributes, fostering more effective basic and translational research and scientific education.

Článek

PDBe and PDBe-KB: Providing high-quality, up-to-date and integrated resources of macromolecular structures to support basic and applied research and education

Protein science. 2022 Oct ; 31 (10) : e4439.

Protein Sci
ISSN 1469-896X | 0961-8368
Zdroj

The archiving and dissemination of protein and nucleic acid structures as well as their structural, functional and biophysical annotations is an essential task that enables the broader scientific community to conduct impactful research in multiple fields of the life sciences. The Protein Data Bank in Europe (PDBe; pdbe.org) team develops and maintains several databases and web services to address this fundamental need. From data archiving as a member of the Worldwide PDB consortium (wwPDB; wwpdb.org), to the PDBe Knowledge Base (PDBe-KB; pdbekb.org), we provide data, data-access mechanisms, and visualizations that facilitate basic and applied research and education across the life sciences. Here, we provide an overview of the structural data and annotations that we integrate and make freely available. We describe the web services and data visualization tools we offer, and provide information on how to effectively use or even further develop them. Finally, we discuss the direction of our data services, and how we aim to tackle new challenges that arise from the recent, unprecedented advances in the field of structure determination and protein structure modeling.

Klíčová slova
bioinformatics, databases, protein data bank, structural biology,
MeSH
databáze proteinů MeSH
konformace proteinů MeSH
nukleové kyseliny * MeSH
proteiny * chemie MeSH
Publikační typ
časopisecké články MeSH
práce podpořená grantem MeSH
Geografické názvy
Evropa MeSH
Názvy látek
nukleové kyseliny * MeSH
proteiny * MeSH

Článek

PDBe: improved findability of macromolecular structure data in the PDB

Nucleic acids research. 2020 Jan 08 ; 48 (D1) : D335-D343.

Nucleic Acids Res
ISSN 1362-4962 | 0305-1048
Zdroj

The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.

* Zobrazit nápovědu

Upřesnit dle MeSH