A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery
Language English Country United States Media print-electronic
Document type Journal Article
Grant support
P30 ES010126
NIEHS NIH HHS - United States
R01 HD103805
NICHD NIH HHS - United States
R35 HG011297
NHGRI NIH HHS - United States
U24 HG011449
NHGRI NIH HHS - United States
PubMed
39394689
PubMed Central
PMC11564936
DOI
10.1016/j.xhgg.2024.100371
PII: S2666-2477(24)00111-8
Knihovny.cz E-resources
- Keywords
- global alliance for genomics and health, human phenotype ontology, phenopacket schema,
- MeSH
- Databases, Genetic MeSH
- Phenotype * MeSH
- Genomics * methods MeSH
- Humans MeSH
- Software MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Berlin Institute of Health at Charité Universitätsmedizin Berlin Berlin Germany
Department of Biomedical Informatics University of Colorado Anschutz Medical Campus Aurora CO USA
Department of Genetics Genomics and Cancer Sciences University of Leicester Leicester UK
Department of Ophthalmology University Clinic Marburg Campus Fulda Fulda Germany
Division of Informatics Imaging and Data Science The University of Manchester Manchester UK
North West Thames Regional Genetics Service Northwick Park and St Mark's Hospitals London UK
The Jackson Laboratory for Genomic Medicine 10 Discovery Drive Farmington CT 06032 USA
University of North Carolina at Chapel Hill Chapel Hill NC USA
William Harvey Research Institute Queen Mary University of London London UK
See more in PubMed
Haendel M., Vasilevsky N., Unni D., Bologa C., Harris N., Rehm H., Hamosh A., Baynam G., Groza T., McMurry J., et al. How many rare diseases are there? Nat. Rev. Drug Discov. 2020;19:77–78. PubMed PMC
Nguengang Wakap S., Lambert D.M., Olry A., Rodwell C., Gueydan C., Lanneau V., Murphy D., Le Cam Y., Rath A. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur. J. Hum. Genet. 2020;28:165–173. PubMed PMC
Rubinstein Y.R., Robinson P.N., Gahl W.A., Avillach P., Baynam G., Cederroth H., Goodwin R.M., Groft S.C., Hansson M.G., Harris N.L., et al. The case for open science: rare diseases. Jamia Open. 2020;3:472–486. doi: 10.1093/jamiaopen/ooaa030. PubMed DOI PMC
Haendel M.A., Chute C.G., Robinson P.N. Classification, Ontology, and Precision. N. Engl. J. Med. 2018;379:1452–1462. PubMed PMC
Putman T.E., Schaper K., Matentzoglu N., Rubinetti V.P., Alquaddoomi F.S., Cox C., Caufield J.H., Elsarboukh G., Gehrke S., Hegde H., et al. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Res. 2024;52:D938–D949. PubMed PMC
Gargano M.A., Matentzoglu N., Coleman B., Addo-Lartey E.B., Anagnostopoulos A.V., Anderton J., Avillach P., Bagley A.M., Bakštein E., Balhoff J.P., et al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 2024;52:D1333–D1346. PubMed PMC
Havrilla J.M., Singaravelu A., Driscoll D.M., Minkovsky L., Helbig I., Medne L., Wang K., Krantz I., Desai B.R. PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care. BMC Med. Inf. Decis. Making. 2022;22:198. PubMed PMC
Daniali M., Galer P.D., Lewis-Smith D., Parthasarathy S., Kim E., Salvucci D.D., Miller J.M., Haag S., Helbig I. Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif. Intell. Med. 2023;139 PubMed PMC
Jacobsen J.O.B., Baudis M., Baynam G.S., Beckmann J.S., Beltran S., Buske O.J., Callahan T.J., Chute C.G., Courtot M., Danis D., et al. The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat. Biotechnol. 2022;40:817–820. PubMed PMC
Ladewig M.S., Jacobsen J.O.B., Wagner A.H., Danis D., El Kassaby B., Gargano M., Groza T., Baudis M., Steinhaus R., Seelow D., et al. GA4GH Phenopackets: A Practical Introduction. Adv. Genet. 2023;4 PubMed PMC
Danis D., Jacobsen J.O.B., Wagner A.H., Groza T., Beckwith M.A., Rekerle L., Carmody L.C., Reese J., Hegde H., Ladewig M.S., et al. Phenopacket-tools: Building and validating GA4GH Phenopackets. PLoS One. 2023;18 PubMed PMC
Goar W., Babb L., Chamala S., Cline M., Freimuth R.R., Hart R.K., Kuzma K., Lee J., Nelson T., Prlić A., et al. Development and application of a computable genotype model in the GA4GH Variation Representation Specification. Pac. Symp. Biocomput. 2023;28:383–394. PubMed PMC
Haendel M., Su A., McMurry J. FAIR-TLC: Metrics to Assess Value of Biomedical Digital Repositories: Response to RFI NOT-OD-16-133. 2016. DOI
Girdea M., Dumitriu S., Fiume M., Bowdin S., Boycott K.M., Chénier S., Chitayat D., Faghfoury H., Meyn M.S., Ray P.N., et al. PhenoTips: Patient Phenotyping Software for Clinical and Research Use. Hum. Mutat. 2013;34:1057–1065. PubMed
Laurie S., Piscia D., Matalonga L., Corvó A., Fernández-Callejo M., Garcia-Linares C., Hernandez-Ferrer C., Luengo C., Martínez I., Papakonstantinou A., et al. The RD-Connect Genome-Phenome Analysis Platform: Accelerating diagnosis, research, and gene discovery for rare diseases. Hum. Mutat. 2022;43:717–733. PubMed PMC
Takahashi Y., Mizusawa H. Initiative on Rare and Undiagnosed Disease in Japan. JMA J. 2021;4:112–118. PubMed PMC
Cohen A.S.A., Farrow E.G., Abdelmoity A.T., Alaimo J.T., Amudhavalli S.M., Anderson J.T., Bansal L., Bartik L., Baybayan P., Belden B., et al. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 2022;24:1336–1348. PubMed
Smedley D., Jacobsen J.O.B., Jäger M., Köhler S., Holtgrewe M., Schubach M., Siragusa E., Zemojtel T., Buske O.J., Washington N.L., et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 2015;10:2004–2015. PubMed PMC
Robinson P.N., Köhler S., Oellrich A., Sanger Mouse Genetics Project. Wang K., Mungall C.J., Lewis S.E., Washington N., Bauer S., Seelow D., et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 2014;24:340–348. PubMed PMC
Robinson P.N., Ravanmehr V., Jacobsen J.O.B., Danis D., Zhang X.A., Carmody L.C., Gargano M.A., Thaxton C.L., UNC Biocuration Core. Karlebach G., et al. Interpretable Clinical Genomics with a Likelihood Ratio Paradigm. Am. J. Hum. Genet. 2020;107:403–417. PubMed PMC
Danis D., Jacobsen J.O.B., Balachandran P., Zhu Q., Yilmaz F., Reese J., Haimel M., Lyon G.J., Helbig I., Mungall C.J., et al. SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing. Genome Med. 2022;14:44. PubMed PMC
Zhao M., Havrilla J.M., Fang L., Chen Y., Peng J., Liu C., Wu C., Sarmady M., Botas P., Isla J., et al. Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases. NAR Genom. Bioinform. 2020;2 PubMed PMC
Peng C., Dieck S., Schmid A., Ahmad A., Knaus A., Wenzel M., Mehnert L., Zirn B., Haack T., Ossowski S., et al. CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph. NAR Genom. Bioinform. 2021;3 PubMed PMC
Lochmüller H., Badowska D.M., Thompson R., Knoers N.V., Aartsma-Rus A., Gut I., Wood L., Harmuth T., Durudas A., Graessner H., et al. RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases. Eur. J. Hum. Genet. 2018;26:778–785. PubMed PMC
Zurek B., Ellwanger K., Vissers L.E.L.M., Schüle R., Synofzik M., Töpf A., de Voer R.M., Laurie S., Matalonga L., Gilissen C., et al. Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases. Eur. J. Hum. Genet. 2021;29:1325–1331. PubMed PMC
Gonzaga-Jauregui C., Lotze T., Jamal L., Penney S., Campbell I.M., Pehlivan D., Hunter J.V., Woodbury S.L., Raymond G., Adesina A.M., et al. Mutations in VRK1 associated with complex motor and sensory axonal neuropathy plus microcephaly. JAMA Neurol. 2013;70:1491–1498. PubMed PMC
Fokkema I.F.A.C., Taschner P.E.M., Schaafsma G.C.P., Celli J., Laros J.F.J., den Dunnen J.T. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 2011;32:557–563. PubMed
Amberger J.S., Bocchini C.A., Scott A.F., Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019;47:D1038–D1043. PubMed PMC
Shefchek K.A., Harris N.L., Gargano M., Matentzoglu N., Unni D., Brush M., Keith D., Conlin T., Vasilevsky N., Zhang X.A., et al. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 2020;48:D704–D715. PubMed PMC
Wagner A.H., Babb L., Alterovitz G., Baudis M., Brush M., Cameron D.L., Cline M., Griffith M., Griffith O.L., Hunt S.E., et al. The GA4GH Variation Representation Specification: A computational framework for variation representation and federated identification. Cell Genom. 2021;1 doi: 10.1016/j.xgen.2021.100027. PubMed DOI PMC
Janecke A.R., Heinz-Erian P., Yin J., Petersen B.-S., Franke A., Lechner S., Fuchs I., Melancon S., Uhlig H.H., Travis S., et al. Reduced sodium/proton exchanger NHE3 activity causes congenital sodium diarrhea. Hum. Mol. Genet. 2015;24:6614–6623. PubMed PMC