A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery
Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic
Typ dokumentu časopisecké články, preprinty
Grantová podpora
R01 HD103805
NICHD NIH HHS - United States
RM1 HG010860
NHGRI NIH HHS - United States
U24 HG011449
NHGRI NIH HHS - United States
PubMed
38854034
PubMed Central
PMC11160806
DOI
10.1101/2024.05.29.24308104
PII: 2024.05.29.24308104
Knihovny.cz E-zdroje
- Publikační typ
- časopisecké články MeSH
- preprinty MeSH
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Berlin Institute of Health at Charité Universitätsmedizin Berlin Berlin Germany
Brotman Baty Institute for Precision Medicine 1959 NE Pacific Street Box 357657 Seattle WA 98195 USA
Department of Biomedical Informatics University of Colorado Anschutz Medical Ccampus
Department of Genetics and Genome Biology University of Leicester Leicester UK
Department of Genetics and Genome Sciences University of Connecticut Health Center Farmington CT USA
Department of Ophthalmology University Clinic Marburg Campus Fulda Fulda Germany
Division of Informatics Imaging and Data Science The University of Manchester Manchester UK
ELLIS European Laboratory for Learning and Intelligent Systems
Medica Genetics University of Catania Italy
Morgagni foundation and Clinic Catania Italy
North West Thames Regional Genetics Service Northwick Park and St Mark's Hospitals London UK
Rare Care Centre Perth Children's Hospital Nedlands WA 6009 Australia
Telethon Kids Institute Nedlands WA 6009 Australia
The Jackson Institute for Genomic Medicine 10 Discovery Drive Farmington CT 06032 USA
University College London Institute of Child Health London United Kingdom
University of Leicester Leicester UK
University of North Carolina at Chapel Hill Chapel Hill NC USA
Utrecht University Utrecht the Netherlands
William Harvey Research Institute Queen Mary University of London London UK
Zobrazit více v PubMed
Haendel M., Vasilevsky N., Unni D., Bologa C., Harris N., Rehm H., Hamosh A., Baynam G., Groza T., McMurry J., et al. (2020). How many rare diseases are there? Nat. Rev. Drug Discov. 19, 77–78. PubMed PMC
Nguengang Wakap S., Lambert D.M., Olry A., Rodwell C., Gueydan C., Lanneau V., Murphy D., Le Cam Y., and Rath A. (2020). Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur. J. Hum. Genet. 28, 165–173. PubMed PMC
Rubinstein Y.R., Robinson P.N., Gahl W.A., Avillach P., Baynam G., Cederroth H., Goodwin R.M., Groft S.C., Hansson M.G., Harris N.L., et al. The case for open science: rare diseases. Jamia Open. 10.1093/jamiaopen/ooaa030. PubMed DOI PMC
Haendel M.A., Chute C.G., and Robinson P.N. (2018). Classification, Ontology, and Precision Medicine. N. Engl. J. Med. 379, 1452–1462. PubMed PMC
Putman T.E., Schaper K., Matentzoglu N., Rubinetti V.P., Alquaddoomi F.S., Cox C., Caufield J.H., Elsarboukh G., Gehrke S., Hegde H., et al. (2024). The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Res. 52, D938–D949. PubMed PMC
Gargano M.A., Matentzoglu N., Coleman B., Addo-Lartey E.B., Anagnostopoulos A.V., Anderton J., Avillach P., Bagley A.M., Bakštein E., Balhoff J.P., et al. (2024). The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52, D1333–D1346. PubMed PMC
Havrilla J.M., Singaravelu A., Driscoll D.M., Minkovsky L., Helbig I., Medne L., Wang K., Krantz I., and Desai B.R. (2022). PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care. BMC Med. Inform. Decis. Mak. 22, 198. PubMed PMC
Daniali M., Galer P.D., Lewis-Smith D., Parthasarathy S., Kim E., Salvucci D.D., Miller J.M., Haag S., and Helbig I. (2023). Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif. Intell. Med. 139, 102523. PubMed PMC
Jacobsen J.O.B., Baudis M., Baynam G.S., Beckmann J.S., Beltran S., Buske O.J., Callahan T.J., Chute C.G., Courtot M., Danis D., et al. (2022). The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat. Biotechnol. 40, 817–820. PubMed PMC
Ladewig M.S., Jacobsen J.O.B., Wagner A.H., Danis D., El Kassaby B., Gargano M., Groza T., Baudis M., Steinhaus R., Seelow D., et al. (2023). GA4GH Phenopackets: A Practical Introduction. Adv. Genet. 4, 2200016. PubMed PMC
Danis D., Jacobsen J.O.B., Wagner A.H., Groza T., Beckwith M.A., Rekerle L., Carmody L.C., Reese J., Hegde H., Ladewig M.S., et al. (2023). Phenopacket-tools: Building and validating GA4GH Phenopackets. PLoS One 18, e0285433. PubMed PMC
Goar W., Babb L., Chamala S., Cline M., Freimuth R.R., Hart R.K., Kuzma K., Lee J., Nelson T., Prlić A., et al. (2023). Development and application of a computable genotype model in the GA4GH Variation Representation Specification. Pac. Symp. Biocomput. 28, 383–394. PubMed PMC
Haendel M., Su A., and McMurry J. (2016). FAIR-TLC: Metrics to Assess Value of Biomedical Digital Repositories: Response to RFI NOT-OD-16–133 10.5281/zenodo.203295. DOI
Girdea M., Dumitriu S., Fiume M., Bowdin S., Boycott K.M., Chénier S., Chitayat D., Faghfoury H., Meyn M.S., Ray P.N., et al. (2013). PhenoTips: Patient Phenotyping Software for Clinical and Research Use. Hum. Mutat. 34, 1057–1065. PubMed
Laurie S., Piscia D., Matalonga L., Corvó A., Fernández-Callejo M., Garcia-Linares C., Hernandez-Ferrer C., Luengo C., Martínez I., Papakonstantinou A., et al. (2022). The RD-Connect Genome-Phenome Analysis Platform: Accelerating diagnosis, research, and gene discovery for rare diseases. Hum. Mutat. 43, 717–733. PubMed PMC
Takahashi Y., and Mizusawa H. (2021). Initiative on Rare and Undiagnosed Disease in Japan. JMA J 4, 112–118. PubMed PMC
Cohen A.S.A., Farrow E.G., Abdelmoity A.T., Alaimo J.T., Amudhavalli S.M., Anderson J.T., Bansal L., Bartik L., Baybayan P., Belden B., et al. (2022). Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet. Med. 24, 1336–1348. PubMed
Smedley D., Jacobsen J.O.B., Jäger M., Köhler S., Holtgrewe M., Schubach M., Siragusa E., Zemojtel T., Buske O.J., Washington N.L., et al. (2015). Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015. PubMed PMC
Robinson P.N., Köhler S., Oellrich A., Sanger Mouse Genetics Project, Wang K., Mungall C.J., Lewis S.E., Washington N., Bauer S., Seelow D., et al. (2014). Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 24, 340–348. PubMed PMC
Robinson P.N., Ravanmehr V., Jacobsen J.O.B., Danis D., Zhang X.A., Carmody L.C., Gargano M.A., Thaxton C.L., UNC Biocuration Core, Karlebach G., et al. (2020). Interpretable Clinical Genomics with a Likelihood Ratio Paradigm. Am. J. Hum. Genet. 107, 403–417. PubMed PMC
Danis D., Jacobsen J.O.B., Balachandran P., Zhu Q., Yilmaz F., Reese J., Haimel M., Lyon G.J., Helbig I., Mungall C.J., et al. (2022). SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing. Genome Med. 14, 44. PubMed PMC
Zhao M., Havrilla J.M., Fang L., Chen Y., Peng J., Liu C., Wu C., Sarmady M., Botas P., Isla J., et al. (2020). Phen2Gene: rapid phenotype-driven gene prioritization for rare diseases. NAR Genom Bioinform 2, lqaa032. PubMed PMC
Peng C., Dieck S., Schmid A., Ahmad A., Knaus A., Wenzel M., Mehnert L., Zirn B., Haack T., Ossowski S., et al. (2021). CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph. NAR Genom Bioinform 3, lqab078. PubMed PMC
Lochmüller H., Badowska D.M., Thompson R., Knoers N.V., Aartsma-Rus A., Gut I., Wood L., Harmuth T., Durudas A., Graessner H., et al. (2018). RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases. Eur. J. Hum. Genet. 26, 778–785. PubMed PMC
Zurek B., Ellwanger K., Vissers L.E.L.M., Schüle R., Synofzik M., Töpf A., de Voer R.M., Laurie S., Matalonga L., Gilissen C., et al. (2021). Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases. Eur. J. Hum. Genet. 29, 1325–1331. PubMed PMC
Gonzaga-Jauregui C., Lotze T., Jamal L., Penney S., Campbell I.M., Pehlivan D., Hunter J.V., Woodbury S.L., Raymond G., Adesina A.M., et al. (2013). Mutations in VRK1 associated with complex motor and sensory axonal neuropathy plus microcephaly. JAMA Neurol. 70, 1491–1498. PubMed PMC
Fokkema I.F.A.C., Taschner P.E.M., Schaafsma G.C.P., Celli J., Laros J.F.J., and den Dunnen J.T. (2011). LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32, 557–563. PubMed
Amberger J.S., Bocchini C.A., Scott A.F., and Hamosh A. (2019). OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 47, D1038–D1043. PubMed PMC
Shefchek K.A., Harris N.L., Gargano M., Matentzoglu N., Unni D., Brush M., Keith D., Conlin T., Vasilevsky N., Zhang X.A., et al. (2020). The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 48, D704–D715. PubMed PMC
Rehm H.L., Page A.J.H., Smith L., Adams J.B., Alterovitz G., Babb L.J., Barkley M.P., Baudis M., Beauvais M.J.S., Beck T., et al. (2021). GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom 1. 10.1016/j.xgen.2021.100029. PubMed DOI PMC
Thorogood A., Rehm H.L., Goodhand P., Page A.J.H., Joly Y., Baudis M., Rambla J., Navarro A., Nyronen T.H., Linden M., et al. (2021). International federation of genomic medicine databases using GA4GH standards. Cell Genom 1. 10.1016/j.xgen.2021.100032. PubMed DOI PMC