JavaScript is NOT enabled !

Please enable JavaScript.

* Show help

Reset

MeSH: Databases, Genetic

203 hits in Articles Filters

Online article

A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery

Danis, Daniel
Author Danis, Daniel Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
Bamshad, Michael J
Author Bamshad, Michael J Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA 98195, USA Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
Bridges, Yasemin
Author Bridges, Yasemin William Harvey Research Institute, Queen Mary University of London, London, UK
Caballero-Oteyza, Andrés
Author Caballero-Oteyza, Andrés Clinic for Immunology and Rheumatology, Hanover Medical School, Hanover, Germany RESiST-Cluster of Excellence 2155, Hanover Medical School, Hanover, Germany
Cacheiro, Pilar
Author Cacheiro, Pilar William Harvey Research Institute, Queen Mary University of London, London, UK
Carmody, Leigh C
Author Carmody, Leigh C The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
Chimirri, Leonardo
Author Chimirri, Leonardo Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
Chong, Jessica X
Author Chong, Jessica X Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA 98195, USA
Coleman, Ben
Author Coleman, Ben The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
Dalgleish, Raymond
Author Dalgleish, Raymond Department of Genetics, Genomics and Cancer Sciences, University of Leicester, Leicester, UK

HGG advances. 2025 ; 6 (1) : 100371. [pub] 20241010

HGG Adv
ISSN 2666-2477
Medvik
Source

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

Article

The proteomic code: Novel amino acid residue pairing models "encode" protein folding and protein-protein interactions

Computers in biology and medicine. 2025 ; 190 (-) : 110033. [pub] 20250319

Comput Biol Med
ISSN 1879-0534
Medvik
Source

Recent advances in protein 3D structure prediction using deep learning have focused on the importance of amino acid residue-residue connections (i.e., pairwise atomic contacts) for accuracy at the expense of mechanistic interpretability. Therefore, we decided to perform a series of analyses based on an alternative framework of residue-residue connections making primary use of the TOP2018 dataset. This framework of residue-residue connections is derived from amino acid residue pairing models both historic and new, all based on genetic principles complemented by relevant biophysical principles. Of these pairing models, three new models (named the GU, Transmuted and Shift pairing models) exhibit the highest observed-over-expected ratios and highest correlations in statistical analyses with various intra- and inter-chain datasets, in comparison to the remaining models. In addition, these new pairing models are universally frequent across different connection ranges, secondary structure connections, and protein sizes. Accordingly, following further statistical and other analyses described herein, we have come to a major conclusion that all three pairing models together could represent the basis of a universal proteomic code (second genetic code) sufficient, in and of itself, to "encode" for both protein folding mechanisms and protein-protein interactions.

MeSH
Amino Acids * chemistry genetics MeSH
Databases, Protein MeSH
Humans MeSH
Models, Molecular * MeSH
Proteins * chemistry genetics metabolism MeSH
Proteomics * MeSH
Protein Folding * MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH

Article

Genomic reanalysis of a pan-European rare-disease resource yields new diagnoses

Nature medicine. 2025 ; 31 (2) : 478-489. [pub] 20250117

Nat Med
ISSN 1546-170X
Medvik
Source

Genetic diagnosis of rare diseases requires accurate identification and interpretation of genomic variants. Clinical and molecular scientists from 37 expert centers across Europe created the Solve-Rare Diseases Consortium (Solve-RD) resource, encompassing clinical, pedigree and genomic rare-disease data (94.5% exomes, 5.5% genomes), and performed systematic reanalysis for 6,447 individuals (3,592 male, 2,855 female) with previously undiagnosed rare diseases from 6,004 families. We established a collaborative, two-level expert review infrastructure that allowed a genetic diagnosis in 506 (8.4%) families. Of 552 disease-causing variants identified, 464 (84.1%) were single-nucleotide variants or short insertions/deletions. These variants were either located in recently published novel disease genes (n = 67), recently reclassified in ClinVar (n = 187) or reclassified by consensus expert decision within Solve-RD (n = 210). Bespoke bioinformatics analyses identified the remaining 15.9% of causative variants (n = 88). Ad hoc expert review, parallel to the systematic reanalysis, diagnosed 249 (4.1%) additional families for an overall diagnostic yield of 12.6%. The infrastructure and collaborative networks set up by Solve-RD can serve as a blueprint for future further scalable international efforts. The resource is open to the global rare-disease community, allowing phenotype, variant and gene queries, as well as genome-wide discoveries.

Article

Curation of gene-disease relationships in primary antibody deficiencies using the ClinGen validation framework

Journal of allergy and clinical immunology. 2025 ; 155 (5) : 1647-1663. [pub] 20250116

J Allergy Clin Immunol
ISSN 1097-6825
Medvik
Source

BACKGROUND: The Clinical Genome Resource (ClinGen) is an international collaborative effort among scientists and clinicians, diagnostic and research laboratories, and the patient community. Using a standardized framework, ClinGen has established guidelines to classify gene-disease relationships as definitive, strong, moderate, and limited on the basis of available scientific and clinical evidence. When the genetic and functional evidence for a gene-disease relationship has conflicting interpretations or contradictory evidence, they can be disputed or refuted. OBJECTIVE: We assessed genes related to primary antibody deficiencies. METHODS: The ClinGen Antibody Deficiencies Gene Curation Expert Panel, using the ClinGen framework, classified genes related to primary antibody deficiency that primarily affect B-cell development and/or function, and that account for the largest proportion of inborn errors of immunity or primary immunodeficiencies. RESULTS: The expert panel curated a total of 65 genes associated with humoral immune defects to validate 74 gene-disease relationships. Of these, 40 were classified as definitive, 1 as strong, 16 as moderate, 15 as limited, and 2 as disputed. The curation process involved reviewing 490 patient records and 3546 associated human phenotype ontology entries. The 3 most frequently observed terms related to primary antibody deficiency were decreased circulating antibody level, pneumonia, and lymphadenopathy. CONCLUSIONS: These curations (publicly available at ClinicalGenome.org) represent the first effort to provide a comprehensive genetic and phenotypic revision of genetic disorders affecting humoral immunity, as reviewed and approved by experts in the field.

MeSH
Databases, Genetic MeSH
Genetic Predisposition to Disease MeSH
Humans MeSH
Primary Immunodeficiency Diseases * genetics MeSH
Immunologic Deficiency Syndromes * genetics MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH

Online article

MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration

Nucleic acids research. 2025 ; 53 (D1) : D678-D690. [pub] 20250106

Nucleic Acids Res
ISSN 1362-4962
Medvik
Source

Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in agriculture, engineering and medicine. Usually, the biosynthesis of these natural products is governed by sets of co-regulated and physically clustered genes known as biosynthetic gene clusters (BGCs). To share information about BGCs in a standardized and machine-readable way, the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard and repository was initiated in 2015. Since its conception, MIBiG has been regularly updated to expand data coverage and remain up to date with innovations in natural product research. Here, we describe MIBiG version 4.0, an extensive update to the data repository and the underlying data standard. In a massive community annotation effort, 267 contributors performed 8304 edits, creating 557 new entries and modifying 590 existing entries, resulting in a new total of 3059 curated entries in MIBiG. Particular attention was paid to ensuring high data quality, with automated data validation using a newly developed custom submission portal prototype, paired with a novel peer-reviewing model. MIBiG 4.0 also takes steps towards a rolling release model and a broader involvement of the scientific community. MIBiG 4.0 is accessible online at https://mibig.secondarymetabolites.org/.

MeSH
Molecular Sequence Annotation MeSH
Biological Products metabolism chemistry MeSH
Biosynthetic Pathways genetics MeSH
Databases, Genetic * MeSH
Data Curation MeSH
Multigene Family * MeSH
Publication type
Journal Article MeSH

Article

Multiplexing methods in dynamic protein crystallography

Methods in enzymology. 2024 ; 709 (-) : 177-206. [pub] 20241024

Methods Enzymol
ISSN 1557-7988
Medvik
Source

Time-resolved X-ray crystallography experiments were first performed in the 1980s, yet they remained a niche technique for decades. With the recent advent of X-ray free electron laser (XFEL) sources and serial crystallographic techniques, time-resolved crystallography has received renewed interest and has become more accessible to a wider user base. Despite this, time-resolved structures represent < 1 % of models deposited in the world-wide Protein Data Bank, indicating that the tools and techniques currently available require further development before such experiments can become truly routine. In this chapter, we demonstrate how applying data multiplexing to time-resolved crystallography can enhance the achievable time resolution at moderately intense monochromatic X-ray sources, ranging from synchrotrons to bench-top sources. We discuss the principles of multiplexing, where this technique may be advantageous, potential pitfalls, and experimental design considerations.

MeSH
Databases, Protein MeSH
Protein Conformation MeSH
Crystallography, X-Ray methods MeSH
Models, Molecular MeSH
Proteins * chemistry MeSH
Synchrotrons MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Research Support, U.S. Gov't, Non-P.H.S. MeSH

Online article

Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature

Scientific data. 2024 ; 11 (1) : 1032. [pub] 20240927

Sci Data
ISSN 2052-4463
Medvik
Source

We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.

Online article

Genomics 2 Proteins portal: a resource and discovery tool for linking genetic screening outputs to protein sequences and structures

Nature methods. 2024 ; 21 (10) : 1947-1957. [pub] 20240918

Nat Methods
ISSN 1548-7105
Medvik
Source

Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics have generated genetic variants at an unprecedented scale. However, efficient tools and resources are needed to link disparate data types-to 'map' variants onto protein structures, to better understand how the variation causes disease, and thereby design therapeutics. Here we present the Genomics 2 Proteins portal ( https://g2p.broadinstitute.org/ ): a human proteome-wide resource that maps 20,076,998 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the Genomics 2 Proteins portal allows users to interactively upload protein residue-wise annotations (for example, variants and scores) as well as the protein structure beyond databases to establish the connection between genomics to proteins. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotypes.

Online article

The 1+Million Genomes Minimal Dataset for Cancer

Nature genetics. 2024 ; 56 (5) : 733-736. [pub] -

Nat Genet
ISSN 1546-1718
Medvik
Source

Online article

EUKARYOME: the rRNA gene reference database for identification of all eukaryotes

Database. 2024 ; 2024 (-) : . [pub] 20240612

Database (Oxford)
ISSN 1758-0463
Medvik
Source

Molecular identification of micro- and macroorganisms based on nuclear markers has revolutionized our understanding of their taxonomy, phylogeny and ecology. Today, research on the diversity of eukaryotes in global ecosystems heavily relies on nuclear ribosomal RNA (rRNA) markers. Here, we present the research community-curated reference database EUKARYOME for nuclear ribosomal 18S rRNA, internal transcribed spacer (ITS) and 28S rRNA markers for all eukaryotes, including metazoans (animals), protists, fungi and plants. It is particularly useful for the identification of arbuscular mycorrhizal fungi as it bridges the four commonly used molecular markers-ITS1, ITS2, 18S V4-V5 and 28S D1-D2 subregions. The key benefits of this database over other annotated reference sequence databases are that it is not restricted to certain taxonomic groups and it includes all rRNA markers. EUKARYOME also offers a number of reference long-read sequences that are derived from (meta)genomic and (meta)barcoding-a unique feature that can be used for taxonomic identification and chimera control of third-generation, long-read, high-throughput sequencing data. Taxonomic assignments of rRNA genes in the database are verified based on phylogenetic approaches. The reference datasets are available in multiple formats from the project homepage, http://www.eukaryome.org.

Collections

Published

Filters

* Show help

* Show help

Refine by MeSH