A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery

. 2024 May 29 ; () : . [epub] 20240529

Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic

Typ dokumentu časopisecké články, preprinty

Perzistentní odkaz   https://www.medvik.cz/link/pmid38854034

Grantová podpora
R01 HD103805 NICHD NIH HHS - United States
RM1 HG010860 NHGRI NIH HHS - United States
U24 HG011449 NHGRI NIH HHS - United States

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.

Berlin Institute of Health at Charité Universitätsmedizin Berlin Berlin Germany

Brotman Baty Institute for Precision Medicine 1959 NE Pacific Street Box 357657 Seattle WA 98195 USA

Department of Biomedical Informatics University of Colorado Anschutz Medical Ccampus

Department of Genetics and Genome Biology University of Leicester Leicester UK

Department of Genetics and Genome Sciences University of Connecticut Health Center Farmington CT USA

Department of Immunology 2nd Faculty of Medicine Charles University and University Hospital in Motol Prague Czech Republic

Department of Immunology National Institute of Women's Children's and Adolescents' Health Fernandes Figueira Rio de Janeiro Brazil

Department of Ophthalmology University Clinic Marburg Campus Fulda Fulda Germany

Department of Paediatric Immunology Great Ormond Street Hospital for Children NHS Foundation Trust London UK

Department of Pediatrics Division of Genetic Medicine Seattle Children's Hospital Seattle WA 98195 USA

Department of Pediatrics Division of Genetic Medicine University of Washington 1959 NE Pacific Street Box 357371 Seattle WA 98195 USA

Department of Pediatrics Faculty of Medicine and University Hospital Carl Gustav Carus Technische Universität Dresden Dresden Germany

Division of Environmental Genomics and Systems Biology Lawrence Berkeley National Laboratory Berkeley CA USA

Division of Informatics Imaging and Data Science The University of Manchester Manchester UK

ELLIS European Laboratory for Learning and Intelligent Systems

High Complexity Laboratory National Institute of Women's Children's and Adolescents' Health Fernandes Figueira Rio de Janeiro Brazil

Medica Genetics University of Catania Italy

Morgagni foundation and Clinic Catania Italy

North West Thames Regional Genetics Service Northwick Park and St Mark's Hospitals London UK

Rare Care Centre Perth Children's Hospital Nedlands WA 6009 Australia

SingHealth Duke NUS Institute of Precision Medicine 5 Hospital Drive Level 9 Singapore 169609 Singapore

Telethon Kids Institute Nedlands WA 6009 Australia

The Jackson Institute for Genomic Medicine 10 Discovery Drive Farmington CT 06032 USA

University Center for Rare Diseases Faculty of Medicine and University Hospital Carl Gustav Carus Technische Universität Dresden Dresden Germany

University College London Institute of Child Health London United Kingdom

University of Leicester Leicester UK

University of North Carolina at Chapel Hill Chapel Hill NC USA

Utrecht University Utrecht the Netherlands

William Harvey Research Institute Queen Mary University of London London UK

Aktualizováno

PubMed

Zobrazit více v PubMed

Haendel M., Vasilevsky N., Unni D., Bologa C., Harris N., Rehm H., Hamosh A., Baynam G., Groza T., McMurry J., et al. (2020). How many rare diseases are there? Nat. Rev. Drug Discov. 19, 77–78. PubMed PMC

Nguengang Wakap S., Lambert D.M., Olry A., Rodwell C., Gueydan C., Lanneau V., Murphy D., Le Cam Y., and Rath A. (2020). Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur. J. Hum. Genet. 28, 165–173. PubMed PMC

Rubinstein Y.R., Robinson P.N., Gahl W.A., Avillach P., Baynam G., Cederroth H., Goodwin R.M., Groft S.C., Hansson M.G., Harris N.L., et al. The case for open science: rare diseases. Jamia Open. 10.1093/jamiaopen/ooaa030. PubMed DOI PMC

Haendel M.A., Chute C.G., and Robinson P.N. (2018). Classification, Ontology, and Precision Medicine. N. Engl. J. Med. 379, 1452–1462. PubMed PMC

Putman T.E., Schaper K., Matentzoglu N., Rubinetti V.P., Alquaddoomi F.S., Cox C., Caufield J.H., Elsarboukh G., Gehrke S., Hegde H., et al. (2024). The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Res. 52, D938–D949. PubMed PMC

Gargano M.A., Matentzoglu N., Coleman B., Addo-Lartey E.B., Anagnostopoulos A.V., Anderton J., Avillach P., Bagley A.M., Bakštein E., Balhoff J.P., et al. (2024). The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52, D1333–D1346. PubMed PMC

Havrilla J.M., Singaravelu A., Driscoll D.M., Minkovsky L., Helbig I., Medne L., Wang K., Krantz I., and Desai B.R. (2022). PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care. BMC Med. Inform. Decis. Mak. 22, 198. PubMed PMC

Daniali M., Galer P.D., Lewis-Smith D., Parthasarathy S., Kim E., Salvucci D.D., Miller J.M., Haag S., and Helbig I. (2023). Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif. Intell. Med. 139, 102523. PubMed PMC

Jacobsen J.O.B., Baudis M., Baynam G.S., Beckmann J.S., Beltran S., Buske O.J., Callahan T.J., Chute C.G., Courtot M., Danis D., et al. (2022). The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat. Biotechnol. 40, 817–820. PubMed PMC

Ladewig M.S., Jacobsen J.O.B., Wagner A.H., Danis D., El Kassaby B., Gargano M., Groza T., Baudis M., Steinhaus R., Seelow D., et al. (2023). GA4GH Phenopackets: A Practical Introduction. Adv. Genet. 4, 2200016. PubMed PMC

Danis D., Jacobsen J.O.B., Wagner A.H., Groza T., Beckwith M.A., Rekerle L., Carmody L.C., Reese J., Hegde H., Ladewig M.S., et al. (2023). Phenopacket-tools: Building and validating GA4GH Phenopackets. PLoS One 18, e0285433. PubMed PMC

Goar W., Babb L., Chamala S., Cline M., Freimuth R.R., Hart R.K., Kuzma K., Lee J., Nelson T., Prlić A., et al. (2023). Development and application of a computable genotype model in the GA4GH Variation Representation Specification. Pac. Symp. Biocomput. 28, 383–394. PubMed PMC

Haendel M., Su A., and McMurry J. (2016). FAIR-TLC: Metrics to Assess Value of Biomedical Digital Repositories: Response to RFI NOT-OD-16–133 10.5281/zenodo.203295. DOI

Girdea M., Dumitriu S., Fiume M., Bowdin S., Boycott K.M., Chénier S., Chitayat D., Faghfoury H., Meyn M.S., Ray P.N., et al. (2013). PhenoTips: Patient Phenotyping Software for Clinical and Research Use. Hum. Mutat. 34, 1057–1065. PubMed

Laurie S., Piscia D., Matalonga L., Corvó A., Fernández-Callejo M., Garcia-Linares C., Hernandez-Ferrer C., Luengo C., Martínez I., Papakonstantinou A., et al. (2022). The RD-Connect Genome-Phenome Analysis Platform: Accelerating diagnosis, research, and gene discovery for rare diseases. Hum. Mutat. 43, 717–733. PubMed PMC

Takahashi Y., and Mizusawa H. (2021). Initiative on Rare and Undiagnosed Disease in Japan. JMA J 4, 112–118. PubMed PMC

Cohen A.S.A., Farrow E.G., Abdelmoity A.T., Alaimo J.T., Amudhavalli S.M., Anderson J.T., Bansal L., Bartik L., Baybayan P., Belden B., et al. (2022). Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348. PubMed

Smedley D., Jacobsen J.O.B., Jäger M., Köhler S., Holtgrewe M., Schubach M., Siragusa E., Zemojtel T., Buske O.J., Washington N.L., et al. (2015). Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015. PubMed PMC

Robinson P.N., Köhler S., Oellrich A., Sanger Mouse Genetics Project, Wang K., Mungall C.J., Lewis S.E., Washington N., Bauer S., Seelow D., et al. (2014). Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 24, 340–348. PubMed PMC

Robinson P.N., Ravanmehr V., Jacobsen J.O.B., Danis D., Zhang X.A., Carmody L.C., Gargano M.A., Thaxton C.L., UNC Biocuration Core, Karlebach G., et al. (2020). Interpretable Clinical Genomics with a Likelihood Ratio Paradigm. Am. J. Hum. Genet. 107, 403–417. PubMed PMC

Danis D., Jacobsen J.O.B., Balachandran P., Zhu Q., Yilmaz F., Reese J., Haimel M., Lyon G.J., Helbig I., Mungall C.J., et al. (2022). SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing. Genome Med. 14, 44. PubMed PMC

Zhao M., Havrilla J.M., Fang L., Chen Y., Peng J., Liu C., Wu C., Sarmady M., Botas P., Isla J., et al. (2020). Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases. NAR Genom Bioinform 2, lqaa032. PubMed PMC

Peng C., Dieck S., Schmid A., Ahmad A., Knaus A., Wenzel M., Mehnert L., Zirn B., Haack T., Ossowski S., et al. (2021). CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph. NAR Genom Bioinform 3, lqab078. PubMed PMC

Lochmüller H., Badowska D.M., Thompson R., Knoers N.V., Aartsma-Rus A., Gut I., Wood L., Harmuth T., Durudas A., Graessner H., et al. (2018). RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases. Eur. J. Hum. Genet. 26, 778–785. PubMed PMC

Zurek B., Ellwanger K., Vissers L.E.L.M., Schüle R., Synofzik M., Töpf A., de Voer R.M., Laurie S., Matalonga L., Gilissen C., et al. (2021). Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases. Eur. J. Hum. Genet. 29, 1325–1331. PubMed PMC

Gonzaga-Jauregui C., Lotze T., Jamal L., Penney S., Campbell I.M., Pehlivan D., Hunter J.V., Woodbury S.L., Raymond G., Adesina A.M., et al. (2013). Mutations in VRK1 associated with complex motor and sensory axonal neuropathy plus microcephaly. JAMA Neurol. 70, 1491–1498. PubMed PMC

Fokkema I.F.A.C., Taschner P.E.M., Schaafsma G.C.P., Celli J., Laros J.F.J., and den Dunnen J.T. (2011). LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32, 557–563. PubMed

Amberger J.S., Bocchini C.A., Scott A.F., and Hamosh A. (2019). OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 47, D1038–D1043. PubMed PMC

Shefchek K.A., Harris N.L., Gargano M., Matentzoglu N., Unni D., Brush M., Keith D., Conlin T., Vasilevsky N., Zhang X.A., et al. (2020). The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 48, D704–D715. PubMed PMC

Rehm H.L., Page A.J.H., Smith L., Adams J.B., Alterovitz G., Babb L.J., Barkley M.P., Baudis M., Beauvais M.J.S., Beck T., et al. (2021). GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom 1. 10.1016/j.xgen.2021.100029. PubMed DOI PMC

Thorogood A., Rehm H.L., Goodhand P., Page A.J.H., Joly Y., Baudis M., Rambla J., Navarro A., Nyronen T.H., Linden M., et al. (2021). International federation of genomic medicine databases using GA4GH standards. Cell Genom 1. 10.1016/j.xgen.2021.100032. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...