JavaScript NENÍ povolen !

Prosím povolte JavaScript.

Článek

FT
PubMed

Záznam pochází z PubMed

Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases

Chimirri, Leonardo
Autor Chimirri, Leonardo ORCID Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
Caufield, J Harry
Autor Caufield, J Harry ORCID Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Bridges, Yasemin
Autor Bridges, Yasemin ORCID William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
Matentzoglu, Nicolas
Autor Matentzoglu, Nicolas ORCID Semanticly, Athens, Greece
Gargano, Michael
Autor Gargano, Michael ORCID The Jackson Laboratory for Genomic Medicine
Cazalla, Mario
Autor Cazalla, Mario ORCID INGEMM-Idipaz, Institute of Medical and Molecular Genetics, Hospital Universitario La Paz, Madrid, Spain
Chen, Shihan
Autor Chen, Shihan Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
Danis, Daniel
Autor Danis, Daniel ORCID Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
Dingemans, Alexander Jm
Autor Dingemans, Alexander Jm ORCID Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
Gehle, Petra
Autor Gehle, Petra Deutsches Herzzentrum der Charité, Berlin, Germany
Graefe, Adam S L
Autor Graefe, Adam S L ORCID Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
Gu, Weihong
Autor Gu, Weihong Chinese HPO Consortium, Beijing, China
Ladewig, Markus S
Autor Ladewig, Markus S ORCID Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany CIBERER-ISCIII, Madrid, Spain
Lapunzina, Pablo
Autor Lapunzina, Pablo ORCID INGEMM-Idipaz, Institute of Medical and Molecular Genetics, Hospital Universitario La Paz, Madrid, Spain
Nevado, Julián
Autor Nevado, Julián INGEMM-Idipaz, Institute of Medical and Molecular Genetics, Hospital Universitario La Paz, Madrid, Spain
Niyonkuru, Enock
Autor Niyonkuru, Enock Lawrence Berkeley National Laboratory, Berkeley, CA, USA Trinity College, Hartford, CT, USA
Ogishima, Soichi
Autor Ogishima, Soichi ORCID INGEM/ToMMo, Tohoku University, Miyagi, Japan
Seelow, Dominik
Autor Seelow, Dominik ORCID Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
Castaño, Jair A Tenorio
Autor Castaño, Jair A Tenorio ORCID INGEMM-Idipaz, Institute of Medical and Molecular Genetics, Hospital Universitario La Paz, Madrid, Spain
Turnovec, Marek
Autor Turnovec, Marek ORCID Department of Biology and Medical Genetics, 2nd Faculty of Medicine, Charles University in Prague and Motol University Hospital, Prague, Czech Republic
de Vries, Bert Ba
Autor de Vries, Bert Ba ORCID Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
Wang, Kai
Autor Wang, Kai ORCID Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
Wissink, Kyran
Autor Wissink, Kyran ORCID Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany Utrecht University, Utrecht, Netherlands
Yüksel, Zafer
Autor Yüksel, Zafer ORCID Department of Human Genetics, Bioscientia Healthcare GmbH, Ingelheim, Germany
Zucca, Gabriele
Autor Zucca, Gabriele ORCID Institute for Maternal and Child Health - IRCCS "Burlo Garofolo" - Trieste, Trieste 34137, Italy
Haendel, Melissa A
Autor Haendel, Melissa A ORCID University of North Carolina at Chapel Hill
Mungall, Christopher J
Autor Mungall, Christopher J ORCID Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Reese, Justin
Autor Reese, Justin ORCID Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Robinson, Peter N
Autor Robinson, Peter N ORCID Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany The Jackson Laboratory for Genomic Medicine

medRxiv. 2025 Feb 28 ; () : . [epub] 20250228

medRxiv
Zdroj

Status PubMed-not-MEDLINE Jazyk angličtina Země Spojené státy americké Médium electronic

Typ dokumentu časopisecké články, preprinty

Perzistentní odkaz https://www.medvik.cz/link/pmid40061308

Grantová podpora
R24 OD011883 NIH HHS - United States
RM1 HG010860 NHGRI NIH HHS - United States
U24 HG011449 NHGRI NIH HHS - United States

Online Plný text

PubMed 40061308
PubMed Central PMC11888497
DOI 10.1101/2025.02.26.25322769
PII: 2025.02.26.25322769
Knihovny.cz E-zdroje

BACKGROUND: Large language models (LLMs) are increasingly used in the medical field for diverse applications including differential diagnostic support. The estimated training data used to create LLMs such as the Generative Pretrained Transformer (GPT) predominantly consist of English-language texts, but LLMs could be used across the globe to support diagnostics if language barriers could be overcome. Initial pilot studies on the utility of LLMs for differential diagnosis in languages other than English have shown promise, but a large-scale assessment on the relative performance of these models in a variety of European and non-European languages on a comprehensive corpus of challenging rare-disease cases is lacking. METHODS: We created 4967 clinical vignettes using structured data captured with Human Phenotype Ontology (HPO) terms with the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema. These clinical vignettes span a total of 378 distinct genetic diseases with 2618 associated phenotypic features. We used translations of the Human Phenotype Ontology together with language-specific templates to generate prompts in English, Chinese, Czech, Dutch, German, Italian, Japanese, Spanish, and Turkish. We applied GPT-4o, version gpt-4o-2024-08-06, to the task of delivering a ranked differential diagnosis using a zero-shot prompt. An ontology-based approach with the Mondo disease ontology was used to map synonyms and to map disease subtypes to clinical diagnoses in order to automate evaluation of LLM responses. FINDINGS: For English, GPT-4o placed the correct diagnosis at the first rank 19·8% and within the top-3 ranks 27·0% of the time. In comparison, for the eight non-English languages tested here the correct diagnosis was placed at rank 1 between 16·9% and 20·5%, within top-3 between 25·3% and 27·7% of cases. INTERPRETATION: The differential diagnostic performance of GPT-4o across a comprehensive corpus of rare-disease cases was consistent across the nine languages tested. This suggests that LLMs such as GPT-4o may have utility in non-English clinical settings. FUNDING: NHGRI 5U24HG011449 and 5RM1HG010860. P.N.R. was supported by a Professorship of the Alexander von Humboldt Foundation; P.L. was supported by a National Grant (PMP21/00063 ONTOPRECISC-III, Fondos FEDER).

Berlin Institute of Health at Charité Universitätsmedizin Berlin Berlin Germany

Chinese HPO Consortium Beijing China

Department of Biology and Medical Genetics 2nd Faculty of Medicine Charles University Prague and Motol University Hospital Prague Czech Republic

Department of Human Genetics Bioscientia Healthcare GmbH Ingelheim Germany

Department of Human Genetics Donders Institute for Brain Cognition and Behaviour Radboud University Medical Center Nijmegen the Netherlands

Department of Ophthalmology University Clinic Marburg Campus Fulda Fulda Germany CIBERER ISCIII Madrid Spain

Department of Pathology and Laboratory Medicine University of Pennsylvania Philadelphia PA USA

Deutsches Herzzentrum der Charité Berlin Germany

INGEM ToMMo Tohoku University Miyagi Japan

INGEMM Idipaz Institute of Medical and Molecular Genetics Hospital Universitario La Paz Madrid Spain

Institute for Maternal and Child Health IRCCS Burlo Garofolo Trieste Trieste 34137 Italy

Lawrence Berkeley National Laboratory Berkeley CA USA

Semanticly Athens Greece

The Jackson Laboratory for Genomic Medicine

Trinity College Hartford CT USA

University of North Carolina at Chapel Hill

Utrecht University Utrecht Netherlands

William Harvey Research Institute Barts and The London School of Medicine and Dentistry Queen Mary University of London London UK

Zobrazit více v PubMed

Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature 2023; 620: 172–80. PubMed PMC

Statistics of Common Crawl Monthly Archives by commoncrawl. https://commoncrawl.github.io/cc-crawl-statistics/plots/languages (accessed Feb 17, 2025).

Hayase J, Liu A, Choi Y, Oh S, Smith NA. Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? 2024; published online July 23. http://arxiv.org/abs/2407.16607 (accessed Feb 18, 2025).

Liu X, Wu J, Shao A, et al. Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study. J Med Internet Res 2024; 26: e51926. PubMed PMC

Lai VD, Ngo NT, Veyseh APB, et al. ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning. 2023; published online April 12. http://arxiv.org/abs/2304.05613 (accessed Jan 31, 2025).

Sallam M, Al-Mahzoum K, Almutawaa RA, et al. The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses. BMC Res Notes 2024; 17: 247. PubMed PMC

Takagi S, Watari T, Erabi A, Sakaguchi K. Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Med Educ 2023; 9: e48002. PubMed PMC

Wu J, Wu X, Qiu Z, et al. Large language models leverage external knowledge to extend clinical insight beyond language boundaries. J Am Med Inform Assoc 2024; 31: 2054–64. PubMed PMC

Jung LB, Gudera JA, Wiegand TLT, Allmendinger S, Dimitriadis K, Koerte IK. ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted. Deutsches Arzteblatt international 2023; 120. DOI:10.3238/arztebl.m2023.0113. PubMed DOI PMC

Accurate Diagnosis of Rare Diseases Remains Difficult Despite Strong Physician Interest. Global Genes. 2014; published online March 6. https://globalgenes.org/raredaily/accurate-diagnosis-of-rare-diseases-remains-difficult-despite-strong-physician-interest/ (accessed April 10, 2017).

Haendel M, Vasilevsky N, Unni D, et al. How many rare diseases are there? Nat Rev Drug Discov 2020; 19: 77–8. PubMed PMC

Clark MM, Stark Z, Farnaes L, et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. NPJ Genom Med 2018; 3: 16. PubMed PMC

Kim J, Wang K, Weng C, Liu C. Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. Am J Hum Genet 2024; 111: 2190–202. PubMed PMC

Vasilevsky NA, Matentzoglu NA, Toro S, et al. Mondo: Unifying diseases for the world, by the world. medRxiv. 2022; : 2022.04.13.22273750.

Reese JT, Chimirri L, Bridges Y, et al. Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools. medRxiv. 2024; published online Nov 7. DOI:10.1101/2024.07.22.24310816. DOI

Gallifant J, Afshar M, Ameen S, et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nature Medicine 2025; 31: 60–9. PubMed

Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 2008; 83: 610–5. PubMed PMC

Gargano MA, Matentzoglu N, Coleman B, et al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res 2024; 52: D1333–46. PubMed PMC

Jacobsen JOB, Baudis M, Baynam GS, et al. The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat Biotechnol 2022; 40: 817–20. PubMed PMC

Danis D, Bamshad MJ, Bridges Y, et al. A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery. HGG Adv. 2025; 6:100371. PubMed PMC

Reese JT, Danis D, Harry Caufield J, et al. On the limitations of large language models in clinical diagnosis. medRxiv. 2024; : 2023.07.13.23292613.

Soroush A, Glicksberg BS, Zimlichman E, et al. Large language models are poor medical coders — benchmarking of medical code querying. NEJM AI 2024; 1. DOI:10.1056/aidbp2300040. DOI

Azamfirei R, Kudchadkar SR, Fackler J. Large language models and the perils of their hallucinations. Crit Care 2023; 27: 120. PubMed PMC

Caufield H, Kroll C, O’Neil ST, et al. CurateGPT: A flexible language-model assisted biocuration tool. 2024; published online Oct 29. http://arxiv.org/abs/2411.00046 (accessed Jan 17, 2025).

Bridges Y, de Souza V, Cortes KG, et al. Towards a standard benchmark for variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework. bioRxiv. 2024;2024.06.13.598672. PubMed

Kruskal WH, Allen Wallis W. Use of Ranks in One-Criterion Variance Analysis. Journal of the American Statistical Association 1952; published online Dec 1. https://www.tandfonline.com/doi/abs/10.1080/01621459.1952.10483441 (accessed Jan 29, 2025). DOI

Kanjee Z, Crowe B, Rodman A. Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge. JAMA 2023; 330: 78–80. PubMed PMC

Evaluating large language models on medical, lay-language, and self-reported descriptions of genetic conditions. The American Journal of Human Genetics 2024; 111: 1819–33. PubMed PMC

Menezes MCS, Hoffmann AF, Tan ALM, et al. The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study. Lancet Digit Health 2025; 7: e35–43. PubMed

Lewis P, Perez E, Piktus A, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. 2020; published online May 22. http://arxiv.org/abs/2005.11401 (accessed Jan 31, 2025).

OpenAI, :, Hurst A, et al. GPT-4o System Card. 2024; published online Oct 25. http://arxiv.org/abs/2410.21276 (accessed Jan 17, 2025).

Peters DH, Garg A, Bloom G, Walker DG, Brieger WR, Rahman MH. Poverty and access to health care in developing countries. Ann N Y Acad Sci 2008; 1136: 161–71. PubMed

Singhal K, Tu T, Gottweis J, et al. Toward expert-level medical question answering with large language models. Nat Med 2025; published online Jan 8. DOI:10.1038/s41591-024-03423-7. PubMed DOI PMC

Najít záznam

v BMČ

Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases

Najít záznam

Citační ukazatele