Detail
Článek
Článek online
FT
Medvik - BMČ
  • Je něco špatně v tomto záznamu ?

Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists

J. Pristoupil, L. Oleaga, V. Junquero, C. Merino, OS. Sureyya, M. Kyncl, A. Burgetova, L. Lambert

. 2025 ; 16 (1) : 66. [pub] 20250322

Status neindexováno Jazyk angličtina Země Německo

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/bmc25008188

Grantová podpora
MH CZ-DRO, Motol University Hospital, 00064203 and General University Hospital in Prague, 00064165 Ministerstvo Zdravotnictví Ceské Republiky
Cooperatio, Medical Diagnostics and Basic Medical Sciences Charles University in Prague

OBJECTIVES: This study aims to assess the accuracy of generative pre-trained transformer 4o (GPT-4o) in answering multiple response questions from the European Diploma in Radiology (EDiR) examination, comparing its performance to that of human candidates. MATERIALS AND METHODS: Results from 42 EDiR candidates across Europe were compared to those from 26 fourth-year medical students who answered exclusively using the ChatGPT-4o in a prospective study (October 2024). The challenge consisted of 52 recall or understanding-based EDiR multiple-response questions, all without visual inputs. RESULTS: The GPT-4o achieved a mean score of 82.1 ± 3.0%, significantly outperforming the EDiR candidates with 49.4 ± 10.5% (p < 0.0001). In particular, chatGPT-4o demonstrated higher true positive rates while maintaining lower false positive rates compared to EDiR candidates, with a higher accuracy rate in all radiology subspecialties (p < 0.0001) except informatics (p = 0.20). There was near-perfect agreement between GPT-4 responses (κ = 0.872) and moderate agreement among EDiR participants (κ = 0.334). Exit surveys revealed that all participants used the copy-and-paste feature, and 73% submitted additional questions to clarify responses. CONCLUSIONS: GPT-4o significantly outperformed human candidates in low-order, text-based EDiR multiple-response questions, demonstrating higher accuracy and reliability. These results highlight GPT-4o's potential in answering text-based radiology questions. Further research is necessary to investigate its performance across different question formats and candidate populations to ensure broader applicability and reliability. CRITICAL RELEVANCE STATEMENT: GPT-4o significantly outperforms human candidates in factual radiology text-based questions in the EDiR, excelling especially in identifying correct responses, with a higher accuracy rate compared to radiologists. KEY POINTS: In EDiR text-based questions, ChatGPT-4o scored higher (82%) than EDiR participants (49%). Compared to radiologists, GPT-4o excelled in identifying correct responses. GPT-4o responses demonstrated higher agreement (κ = 0.87) compared to EDiR candidates (κ = 0.33).

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc25008188
003      
CZ-PrNML
005      
20250422095653.0
007      
ta
008      
250408s2025 gw f 000 0|eng||
009      
AR
024    7_
$a 10.1186/s13244-025-01941-7 $2 doi
035    __
$a (PubMed)40120065
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a gw
100    1_
$a Pristoupil, Jakub $u Department of Imaging Methods, Motol University Hospital and Second Faculty of Medicine, Charles University, Prague, Czech Republic
245    10
$a Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists / $c J. Pristoupil, L. Oleaga, V. Junquero, C. Merino, OS. Sureyya, M. Kyncl, A. Burgetova, L. Lambert
520    9_
$a OBJECTIVES: This study aims to assess the accuracy of generative pre-trained transformer 4o (GPT-4o) in answering multiple response questions from the European Diploma in Radiology (EDiR) examination, comparing its performance to that of human candidates. MATERIALS AND METHODS: Results from 42 EDiR candidates across Europe were compared to those from 26 fourth-year medical students who answered exclusively using the ChatGPT-4o in a prospective study (October 2024). The challenge consisted of 52 recall or understanding-based EDiR multiple-response questions, all without visual inputs. RESULTS: The GPT-4o achieved a mean score of 82.1 ± 3.0%, significantly outperforming the EDiR candidates with 49.4 ± 10.5% (p < 0.0001). In particular, chatGPT-4o demonstrated higher true positive rates while maintaining lower false positive rates compared to EDiR candidates, with a higher accuracy rate in all radiology subspecialties (p < 0.0001) except informatics (p = 0.20). There was near-perfect agreement between GPT-4 responses (κ = 0.872) and moderate agreement among EDiR participants (κ = 0.334). Exit surveys revealed that all participants used the copy-and-paste feature, and 73% submitted additional questions to clarify responses. CONCLUSIONS: GPT-4o significantly outperformed human candidates in low-order, text-based EDiR multiple-response questions, demonstrating higher accuracy and reliability. These results highlight GPT-4o's potential in answering text-based radiology questions. Further research is necessary to investigate its performance across different question formats and candidate populations to ensure broader applicability and reliability. CRITICAL RELEVANCE STATEMENT: GPT-4o significantly outperforms human candidates in factual radiology text-based questions in the EDiR, excelling especially in identifying correct responses, with a higher accuracy rate compared to radiologists. KEY POINTS: In EDiR text-based questions, ChatGPT-4o scored higher (82%) than EDiR participants (49%). Compared to radiologists, GPT-4o excelled in identifying correct responses. GPT-4o responses demonstrated higher agreement (κ = 0.87) compared to EDiR candidates (κ = 0.33).
590    __
$a NEINDEXOVÁNO
655    _2
$a časopisecké články $7 D016428
700    1_
$a Oleaga, Laura $u Department of Radiology, Clinical Diagnostic Imaging Centre, Hospital Clínic de Barcelona, Barcelona, Spain
700    1_
$a Junquero, Vanesa $u Department of Radiology, Clinical Diagnostic Imaging Centre, Hospital Clínic de Barcelona, Barcelona, Spain
700    1_
$a Merino, Cristina $u Department of Radiology, Clinical Diagnostic Imaging Centre, Hospital Clínic de Barcelona, Barcelona, Spain
700    1_
$a Sureyya, Ozbek Suha $u Era Radiology Center, Izmir, Turkey
700    1_
$a Kyncl, Martin $u Department of Imaging Methods, Motol University Hospital and Second Faculty of Medicine, Charles University, Prague, Czech Republic
700    1_
$a Burgetova, Andrea $u Department of Radiology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
700    1_
$a Lambert, Lukas $u Department of Imaging Methods, Motol University Hospital and Second Faculty of Medicine, Charles University, Prague, Czech Republic. lambert.lukas@gmail.com $1 https://orcid.org/0000000322994707 $7 xx0145830
773    0_
$w MED00181719 $t Insights into imaging $x 1869-4101 $g Roč. 16, č. 1 (2025), s. 66
856    41
$u https://pubmed.ncbi.nlm.nih.gov/40120065 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20250408 $b ABA008
991    __
$a 20250422095655 $b ABA008
999    __
$a ok $b bmc $g 2306297 $s 1245263
BAS    __
$a 3
BAS    __
$a PreBMC-PubMed-not-MEDLINE
BMC    __
$a 2025 $b 16 $c 1 $d 66 $e 20250322 $i 1869-4101 $m Insights into imaging $n Insights Imaging $x MED00181719
GRA    __
$a MH CZ-DRO, Motol University Hospital, 00064203 and General University Hospital in Prague, 00064165 $p Ministerstvo Zdravotnictví Ceské Republiky
GRA    __
$a Cooperatio, Medical Diagnostics and Basic Medical Sciences $p Charles University in Prague
LZP    __
$a Pubmed-20250408

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...