-
Je něco špatně v tomto záznamu ?
Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists
J. Pristoupil, L. Oleaga, V. Junquero, C. Merino, OS. Sureyya, M. Kyncl, A. Burgetova, L. Lambert
Status neindexováno Jazyk angličtina Země Německo
Typ dokumentu časopisecké články
Grantová podpora
MH CZ-DRO, Motol University Hospital, 00064203 and General University Hospital in Prague, 00064165
Ministerstvo Zdravotnictví Ceské Republiky
Cooperatio, Medical Diagnostics and Basic Medical Sciences
Charles University in Prague
NLK
Directory of Open Access Journals
od 2012
Free Medical Journals
od 2010
PubMed Central
od 2010
Europe PubMed Central
od 2010
ProQuest Central
od 2012-08-01
Open Access Digital Library
od 2010-01-01
Open Access Digital Library
od 2010-01-01
Open Access Digital Library
od 2012-01-01
Nursing & Allied Health Database (ProQuest)
od 2012-08-01
Health & Medicine (ProQuest)
od 2012-08-01
ROAD: Directory of Open Access Scholarly Resources
od 2010
Springer Journals Complete - Open Access
od 2010-01-01
Springer Nature OA/Free Journals
od 2010-01-01
- Publikační typ
- časopisecké články MeSH
OBJECTIVES: This study aims to assess the accuracy of generative pre-trained transformer 4o (GPT-4o) in answering multiple response questions from the European Diploma in Radiology (EDiR) examination, comparing its performance to that of human candidates. MATERIALS AND METHODS: Results from 42 EDiR candidates across Europe were compared to those from 26 fourth-year medical students who answered exclusively using the ChatGPT-4o in a prospective study (October 2024). The challenge consisted of 52 recall or understanding-based EDiR multiple-response questions, all without visual inputs. RESULTS: The GPT-4o achieved a mean score of 82.1 ± 3.0%, significantly outperforming the EDiR candidates with 49.4 ± 10.5% (p < 0.0001). In particular, chatGPT-4o demonstrated higher true positive rates while maintaining lower false positive rates compared to EDiR candidates, with a higher accuracy rate in all radiology subspecialties (p < 0.0001) except informatics (p = 0.20). There was near-perfect agreement between GPT-4 responses (κ = 0.872) and moderate agreement among EDiR participants (κ = 0.334). Exit surveys revealed that all participants used the copy-and-paste feature, and 73% submitted additional questions to clarify responses. CONCLUSIONS: GPT-4o significantly outperformed human candidates in low-order, text-based EDiR multiple-response questions, demonstrating higher accuracy and reliability. These results highlight GPT-4o's potential in answering text-based radiology questions. Further research is necessary to investigate its performance across different question formats and candidate populations to ensure broader applicability and reliability. CRITICAL RELEVANCE STATEMENT: GPT-4o significantly outperforms human candidates in factual radiology text-based questions in the EDiR, excelling especially in identifying correct responses, with a higher accuracy rate compared to radiologists. KEY POINTS: In EDiR text-based questions, ChatGPT-4o scored higher (82%) than EDiR participants (49%). Compared to radiologists, GPT-4o excelled in identifying correct responses. GPT-4o responses demonstrated higher agreement (κ = 0.87) compared to EDiR candidates (κ = 0.33).
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc25008188
- 003
- CZ-PrNML
- 005
- 20250422095653.0
- 007
- ta
- 008
- 250408s2025 gw f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1186/s13244-025-01941-7 $2 doi
- 035 __
- $a (PubMed)40120065
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a gw
- 100 1_
- $a Pristoupil, Jakub $u Department of Imaging Methods, Motol University Hospital and Second Faculty of Medicine, Charles University, Prague, Czech Republic
- 245 10
- $a Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists / $c J. Pristoupil, L. Oleaga, V. Junquero, C. Merino, OS. Sureyya, M. Kyncl, A. Burgetova, L. Lambert
- 520 9_
- $a OBJECTIVES: This study aims to assess the accuracy of generative pre-trained transformer 4o (GPT-4o) in answering multiple response questions from the European Diploma in Radiology (EDiR) examination, comparing its performance to that of human candidates. MATERIALS AND METHODS: Results from 42 EDiR candidates across Europe were compared to those from 26 fourth-year medical students who answered exclusively using the ChatGPT-4o in a prospective study (October 2024). The challenge consisted of 52 recall or understanding-based EDiR multiple-response questions, all without visual inputs. RESULTS: The GPT-4o achieved a mean score of 82.1 ± 3.0%, significantly outperforming the EDiR candidates with 49.4 ± 10.5% (p < 0.0001). In particular, chatGPT-4o demonstrated higher true positive rates while maintaining lower false positive rates compared to EDiR candidates, with a higher accuracy rate in all radiology subspecialties (p < 0.0001) except informatics (p = 0.20). There was near-perfect agreement between GPT-4 responses (κ = 0.872) and moderate agreement among EDiR participants (κ = 0.334). Exit surveys revealed that all participants used the copy-and-paste feature, and 73% submitted additional questions to clarify responses. CONCLUSIONS: GPT-4o significantly outperformed human candidates in low-order, text-based EDiR multiple-response questions, demonstrating higher accuracy and reliability. These results highlight GPT-4o's potential in answering text-based radiology questions. Further research is necessary to investigate its performance across different question formats and candidate populations to ensure broader applicability and reliability. CRITICAL RELEVANCE STATEMENT: GPT-4o significantly outperforms human candidates in factual radiology text-based questions in the EDiR, excelling especially in identifying correct responses, with a higher accuracy rate compared to radiologists. KEY POINTS: In EDiR text-based questions, ChatGPT-4o scored higher (82%) than EDiR participants (49%). Compared to radiologists, GPT-4o excelled in identifying correct responses. GPT-4o responses demonstrated higher agreement (κ = 0.87) compared to EDiR candidates (κ = 0.33).
- 590 __
- $a NEINDEXOVÁNO
- 655 _2
- $a časopisecké články $7 D016428
- 700 1_
- $a Oleaga, Laura $u Department of Radiology, Clinical Diagnostic Imaging Centre, Hospital Clínic de Barcelona, Barcelona, Spain
- 700 1_
- $a Junquero, Vanesa $u Department of Radiology, Clinical Diagnostic Imaging Centre, Hospital Clínic de Barcelona, Barcelona, Spain
- 700 1_
- $a Merino, Cristina $u Department of Radiology, Clinical Diagnostic Imaging Centre, Hospital Clínic de Barcelona, Barcelona, Spain
- 700 1_
- $a Sureyya, Ozbek Suha $u Era Radiology Center, Izmir, Turkey
- 700 1_
- $a Kyncl, Martin $u Department of Imaging Methods, Motol University Hospital and Second Faculty of Medicine, Charles University, Prague, Czech Republic
- 700 1_
- $a Burgetova, Andrea $u Department of Radiology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
- 700 1_
- $a Lambert, Lukas $u Department of Imaging Methods, Motol University Hospital and Second Faculty of Medicine, Charles University, Prague, Czech Republic. lambert.lukas@gmail.com $1 https://orcid.org/0000000322994707 $7 xx0145830
- 773 0_
- $w MED00181719 $t Insights into imaging $x 1869-4101 $g Roč. 16, č. 1 (2025), s. 66
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/40120065 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y - $z 0
- 990 __
- $a 20250408 $b ABA008
- 991 __
- $a 20250422095655 $b ABA008
- 999 __
- $a ok $b bmc $g 2306297 $s 1245263
- BAS __
- $a 3
- BAS __
- $a PreBMC-PubMed-not-MEDLINE
- BMC __
- $a 2025 $b 16 $c 1 $d 66 $e 20250322 $i 1869-4101 $m Insights into imaging $n Insights Imaging $x MED00181719
- GRA __
- $a MH CZ-DRO, Motol University Hospital, 00064203 and General University Hospital in Prague, 00064165 $p Ministerstvo Zdravotnictví Ceské Republiky
- GRA __
- $a Cooperatio, Medical Diagnostics and Basic Medical Sciences $p Charles University in Prague
- LZP __
- $a Pubmed-20250408