JavaScript is NOT enabled !

Please enable JavaScript.

gene curation Query Show help

Exact matching Semantic

Reset

41 hits in Medvik

Online article

A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery

... includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes ...

Danis, Daniel
Author Danis, Daniel Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
Bamshad, Michael J
Author Bamshad, Michael J Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA 98195, USA Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
Bridges, Yasemin
Author Bridges, Yasemin William Harvey Research Institute, Queen Mary University of London, London, UK
Caballero-Oteyza, Andrés
Author Caballero-Oteyza, Andrés Clinic for Immunology and Rheumatology, Hanover Medical School, Hanover, Germany RESiST-Cluster of Excellence 2155, Hanover Medical School, Hanover, Germany
Cacheiro, Pilar
Author Cacheiro, Pilar William Harvey Research Institute, Queen Mary University of London, London, UK
Carmody, Leigh C
Author Carmody, Leigh C The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
Chimirri, Leonardo
Author Chimirri, Leonardo Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
Chong, Jessica X
Author Chong, Jessica X Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA 98195, USA
Coleman, Ben
Author Coleman, Ben The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
Dalgleish, Raymond
Author Dalgleish, Raymond Department of Genetics, Genomics and Cancer Sciences, University of Leicester, Leicester, UK

HGG advances. 2025 ; 6 (1) : 100371. [pub] 20241010

HGG Adv
ISSN 2666-2477
Medvik
Source

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

Article

Curation of gene-disease relationships in primary antibody deficiencies using the ClinGen validation framework

... Using a standardized framework, ClinGen has established guidelines to classify gene-disease relationships ...

Journal of allergy and clinical immunology. 2025 ; 155 (5) : 1647-1663. [pub] 20250116

J Allergy Clin Immunol
ISSN 1097-6825
Medvik
Source

BACKGROUND: The Clinical Genome Resource (ClinGen) is an international collaborative effort among scientists and clinicians, diagnostic and research laboratories, and the patient community. Using a standardized framework, ClinGen has established guidelines to classify gene-disease relationships as definitive, strong, moderate, and limited on the basis of available scientific and clinical evidence. When the genetic and functional evidence for a gene-disease relationship has conflicting interpretations or contradictory evidence, they can be disputed or refuted. OBJECTIVE: We assessed genes related to primary antibody deficiencies. METHODS: The ClinGen Antibody Deficiencies Gene Curation Expert Panel, using the ClinGen framework, classified genes related to primary antibody deficiency that primarily affect B-cell development and/or function, and that account for the largest proportion of inborn errors of immunity or primary immunodeficiencies. RESULTS: The expert panel curated a total of 65 genes associated with humoral immune defects to validate 74 gene-disease relationships. Of these, 40 were classified as definitive, 1 as strong, 16 as moderate, 15 as limited, and 2 as disputed. The curation process involved reviewing 490 patient records and 3546 associated human phenotype ontology entries. The 3 most frequently observed terms related to primary antibody deficiency were decreased circulating antibody level, pneumonia, and lymphadenopathy. CONCLUSIONS: These curations (publicly available at ClinicalGenome.org) represent the first effort to provide a comprehensive genetic and phenotypic revision of genetic disorders affecting humoral immunity, as reviewed and approved by experts in the field.

MeSH
Databases, Genetic MeSH
Genetic Predisposition to Disease MeSH
Humans MeSH
Primary Immunodeficiency Diseases * genetics MeSH
Immunologic Deficiency Syndromes * genetics MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH

Online article

MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration

... biosynthesis of these natural products is governed by sets of co-regulated and physically clustered genes ...

Nucleic acids research. 2025 ; 53 (D1) : D678-D690. [pub] 20250106

Nucleic Acids Res
ISSN 1362-4962
Medvik
Source

Specialized or secondary metabolites are small molecules of biological origin, often showing potent biological activities with applications in agriculture, engineering and medicine. Usually, the biosynthesis of these natural products is governed by sets of co-regulated and physically clustered genes known as biosynthetic gene clusters (BGCs). To share information about BGCs in a standardized and machine-readable way, the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard and repository was initiated in 2015. Since its conception, MIBiG has been regularly updated to expand data coverage and remain up to date with innovations in natural product research. Here, we describe MIBiG version 4.0, an extensive update to the data repository and the underlying data standard. In a massive community annotation effort, 267 contributors performed 8304 edits, creating 557 new entries and modifying 590 existing entries, resulting in a new total of 3059 curated entries in MIBiG. Particular attention was paid to ensuring high data quality, with automated data validation using a newly developed custom submission portal prototype, paired with a novel peer-reviewing model. MIBiG 4.0 also takes steps towards a rolling release model and a broader involvement of the scientific community. MIBiG 4.0 is accessible online at https://mibig.secondarymetabolites.org/.

MeSH
Molecular Sequence Annotation MeSH
Biological Products metabolism chemistry MeSH
Biosynthetic Pathways genetics MeSH
Databases, Genetic * MeSH
Data Curation MeSH
Multigene Family * MeSH
Publication type
Journal Article MeSH

Online article

Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature

... We present a novel system that leverages curators in the loop to develop a dataset and model for detecting ...

Scientific data. 2024 ; 11 (1) : 1032. [pub] 20240927

Sci Data
ISSN 2052-4463
Medvik
Source

We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.

Online article

EUKARYOME: the rRNA gene reference database for identification of all eukaryotes

... Here, we present the research community-curated reference database EUKARYOME for nuclear ribosomal 18S ...

Database. 2024 ; 2024 (-) : . [pub] 20240612

Database (Oxford)
ISSN 1758-0463
Medvik
Source

Molecular identification of micro- and macroorganisms based on nuclear markers has revolutionized our understanding of their taxonomy, phylogeny and ecology. Today, research on the diversity of eukaryotes in global ecosystems heavily relies on nuclear ribosomal RNA (rRNA) markers. Here, we present the research community-curated reference database EUKARYOME for nuclear ribosomal 18S rRNA, internal transcribed spacer (ITS) and 28S rRNA markers for all eukaryotes, including metazoans (animals), protists, fungi and plants. It is particularly useful for the identification of arbuscular mycorrhizal fungi as it bridges the four commonly used molecular markers-ITS1, ITS2, 18S V4-V5 and 28S D1-D2 subregions. The key benefits of this database over other annotated reference sequence databases are that it is not restricted to certain taxonomic groups and it includes all rRNA markers. EUKARYOME also offers a number of reference long-read sequences that are derived from (meta)genomic and (meta)barcoding-a unique feature that can be used for taxonomic identification and chimera control of third-generation, long-read, high-throughput sequencing data. Taxonomic assignments of rRNA genes in the database are verified based on phylogenetic approaches. The reference datasets are available in multiple formats from the project homepage, http://www.eukaryome.org.

Article

AHoJ-DB: A PDB-wide Assignment of apo & holo Relationships Based on Individual Protein-Ligand Interactions

... With the recent explosion in the field of structural biology, large, curated datasets are urgently needed ...

Journal of molecular biology. 2024 ; 436 (17) : 168545. [pub] 20240318

J Mol Biol
ISSN 1089-8638
Medvik
Source

A single protein structure is rarely sufficient to capture the conformational variability of a protein. Both bound and unbound (holo and apo) forms of a protein are essential for understanding its geometry and making meaningful comparisons. Nevertheless, docking or drug design studies often still consider only single protein structures in their holo form, which are for the most part rigid. With the recent explosion in the field of structural biology, large, curated datasets are urgently needed. Here, we use a previously developed application (AHoJ) to perform a comprehensive search for apo-holo pairs for 468,293 biologically relevant protein-ligand interactions across 27,983 proteins. In each search, the binding pocket is captured and mapped across existing structures within the same UniProt, and the mapped pockets are annotated as apo or holo, based on the presence or absence of ligands. We assemble the results into a database, AHoJ-DB (www.apoholo.cz/db), that captures the variability of proteins with identical sequences, thereby exposing the agents responsible for the observed differences in geometry. We report several metrics for each annotated pocket, and we also include binding pockets that form at the interface of multiple chains. Analysis of the database shows that about 24% of the binding sites occur at the interface of two or more chains and that less than 50% of the total binding sites processed have an apo form in the PDB. These results can be used to train and evaluate predictors, discover potentially druggable proteins, and reveal protein- and ligand-specific relationships that were previously obscured by intermittent or partial data. Availability: www.apoholo.cz/db.

Online article

Delineation of functionally essential protein regions for 242 neurodevelopmental genes

... However, even in 2022, most identified genetic variants in NDD genes are 'variants of uncertain significance ...

Brain. 2023 ; 146 (2) : 519-533. [pub] 2023Feb13

ISSN 1460-2156
Medvik
Source

Neurodevelopmental disorders (NDDs), including severe paediatric epilepsy, autism and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are 'variants of uncertain significance'. To safely enrol patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can 'tolerate' missense variants and which ones are 'essential' and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the 3D structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14 377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360 000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 NDDs and identified 14 377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins.

MeSH
Child MeSH
Genetic Testing MeSH
Humans MeSH
Intellectual Disability * genetics MeSH
Mutation, Missense MeSH
Mutation genetics MeSH
Neurodevelopmental Disorders * genetics MeSH
Check Tag
Child MeSH
Humans MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Research Support, N.I.H., Extramural MeSH

Online article

Genomic analysis of two phlebotomine sand fly vectors of Leishmania from the New and Old World

... We categorized and curated genes involved in processes important to their roles as disease vectors, including ...

PLoS neglected tropical diseases. 2023 ; 17 (4) : e0010862. [pub] 20230412

PLoS negl. trop. dis.
ISSN 1935-2735
Medvik
Source

Phlebotomine sand flies are of global significance as important vectors of human disease, transmitting bacterial, viral, and protozoan pathogens, including the kinetoplastid parasites of the genus Leishmania, the causative agents of devastating diseases collectively termed leishmaniasis. More than 40 pathogenic Leishmania species are transmitted to humans by approximately 35 sand fly species in 98 countries with hundreds of millions of people at risk around the world. No approved efficacious vaccine exists for leishmaniasis and available therapeutic drugs are either toxic and/or expensive, or the parasites are becoming resistant to the more recently developed drugs. Therefore, sand fly and/or reservoir control are currently the most effective strategies to break transmission. To better understand the biology of sand flies, including the mechanisms involved in their vectorial capacity, insecticide resistance, and population structures we sequenced the genomes of two geographically widespread and important sand fly vector species: Phlebotomus papatasi, a vector of Leishmania parasites that cause cutaneous leishmaniasis, (distributed in Europe, the Middle East and North Africa) and Lutzomyia longipalpis, a vector of Leishmania parasites that cause visceral leishmaniasis (distributed across Central and South America). We categorized and curated genes involved in processes important to their roles as disease vectors, including chemosensation, blood feeding, circadian rhythm, immunity, and detoxification, as well as mobile genetic elements. We also defined gene orthology and observed micro-synteny among the genomes. Finally, we present the genetic diversity and population structure of these species in their respective geographical areas. These genomes will be a foundation on which to base future efforts to prevent vector-borne transmission of Leishmania parasites.

MeSH
Genomics MeSH
Leishmania * genetics MeSH
Leishmaniasis, Cutaneous * MeSH
Humans MeSH
Phlebotomus * parasitology MeSH
Psychodidae * parasitology MeSH
Animals MeSH
Check Tag
Humans MeSH
Animals MeSH
Publication type
Journal Article MeSH
Research Support, N.I.H., Extramural MeSH

Online article

The Allele Catalog Tool: a web-based interactive tool for allele discovery and analysis

... takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene ...

BMC genomics. 2023 ; 24 (1) : 107. [pub] 20230310

BMC Genomics
ISSN 1471-2164
Medvik
Source

BACKGROUND: The advancement of sequencing technologies today has made a plethora of whole-genome re-sequenced (WGRS) data publicly available. However, research utilizing the WGRS data without further configuration is nearly impossible. To solve this problem, our research group has developed an interactive Allele Catalog Tool to enable researchers to explore the coding region allelic variation present in over 1,000 re-sequenced accessions each for soybean, Arabidopsis, and maize. RESULTS: The Allele Catalog Tool was designed originally with soybean genomic data and resources. The Allele Catalog datasets were generated using our variant calling pipeline (SnakyVC) and the Allele Catalog pipeline (AlleleCatalog). The variant calling pipeline is developed to parallelly process raw sequencing reads to generate the Variant Call Format (VCF) files, and the Allele Catalog pipeline takes VCF files to perform imputations, functional effect predictions, and assemble alleles for each gene to generate curated Allele Catalog datasets. Both pipelines were utilized to generate the data panels (VCF files and Allele Catalog files) in which the accessions of the WGRS datasets were collected from various sources, currently representing over 1,000 diverse accessions for soybean, Arabidopsis, and maize individually. The main features of the Allele Catalog Tool include data query, visualization of results, categorical filtering, and download functions. Queries are performed from user input, and results are a tabular format of summary results by categorical description and genotype results of the alleles for each gene. The categorical information is specific to each species; additionally, available detailed meta-information is provided in modal popups. The genotypic information contains the variant positions, reference or alternate genotypes, the functional effect classes, and the amino-acid changes of each accession. Besides that, the results can also be downloaded for other research purposes. CONCLUSIONS: The Allele Catalog Tool is a web-based tool that currently supports three species: soybean, Arabidopsis, and maize. The Soybean Allele Catalog Tool is hosted on the SoyKB website ( https://soykb.org/SoybeanAlleleCatalogTool/ ), while the Allele Catalog Tool for Arabidopsis and maize is hosted on the KBCommons website ( https://kbcommons.org/system/tools/AlleleCatalogTool/Zmays and https://kbcommons.org/system/tools/AlleleCatalogTool/Athaliana ). Researchers can use this tool to connect variant alleles of genes with meta-information of species.

Online article

The Clinical Genome Resource (ClinGen) Familial Hypercholesterolemia Variant Curation Expert Panel consensus guidelines for LDLR variant classification

... accuracy and consistency, the Clinical Genome Resource Familial Hypercholesterolemia (FH) Variant Curation ...

Genetics in medicine. 2022 ; 24 (2) : 293-306. [pub] 20211130

Genet Med
ISSN 1530-0366
Medvik
Source

PURPOSE: In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published consensus standardized guidelines for sequence-level variant classification in Mendelian disorders. To increase accuracy and consistency, the Clinical Genome Resource Familial Hypercholesterolemia (FH) Variant Curation Expert Panel was tasked with optimizing the existing ACMG/AMP framework for disease-specific classification in FH. In this study, we provide consensus recommendations for the most common FH-associated gene, LDLR, where >2300 unique FH-associated variants have been identified. METHODS: The multidisciplinary FH Variant Curation Expert Panel met in person and through frequent emails and conference calls to develop LDLR-specific modifications of ACMG/AMP guidelines. Through iteration, pilot testing, debate, and commentary, consensus among experts was reached. RESULTS: The consensus LDLR variant modifications to existing ACMG/AMP guidelines include (1) alteration of population frequency thresholds, (2) delineation of loss-of-function variant types, (3) functional study criteria specifications, (4) cosegregation criteria specifications, and (5) specific use and thresholds for in silico prediction tools, among others. CONCLUSION: Establishment of these guidelines as the new standard in the clinical laboratory setting will result in a more evidence-based, harmonized method for LDLR variant classification worldwide, thereby improving the care of patients with FH.

MeSH
Genetic Variation genetics MeSH
Genetic Testing methods MeSH
Genome, Human * genetics MeSH
Genomics methods MeSH
Hyperlipoproteinemia Type II * genetics MeSH
Humans MeSH
Check Tag
Humans MeSH
Publication type
Journal Article MeSH
Research Support, Non-U.S. Gov't MeSH
Research Support, N.I.H., Extramural MeSH

Collections

Published

Filters

gene curation Query Show help

Exact matching Semantic

gene curation Query Show help Exact matching Semantic

Refine by MeSH

gene curation Query Show help

Exact matching Semantic