Detail
Článek
Článek online
FT
Medvik - BMČ
  • Je něco špatně v tomto záznamu ?

GastroGPT: Development and controlled testing of a proof-of-concept customized clinical language model

C. Simsek, M. Ucdal, E. de-Madaria, A. Ebigbo, P. Vanek, O. Elshaarawy, TA. Voiosu, G. Antonelli, R. Turró, JP. Gisbert, OP. Nyssen, C. Hassan, H. Messmann, R. Jalan

. 2025 ; 13 (-) : a26372163. [pub] 20250806

Status neindexováno Jazyk angličtina Země Německo

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/bmc25020640

BACKGROUND AND STUDY AIMS: Current general-purpose artificial intelligence (AI) large language models (LLMs) demonstrate limited efficacy in clinical medicine, often constrained to question-answering, documentation, and literature summarization roles. We developed GastroGPT, a proof-of-concept specialty-specific, multi-task, clinical LLM, and evaluated its performance against leading general-purpose LLMs across key gastroenterology tasks and diverse case scenarios. METHODS: In this structured analysis, GastroGPT was compared with three state-of-the-art general-purpose LLMs (LLM-A: GPT-4, LLM-B: Bard, LLM-C: Claude). Models were assessed on seven clinical tasks and overall performance across 10 simulated gastroenterology cases varying in complexity, frequency, and patient demographics. Standardized prompts facilitated structured comparisons. A blinded expert panel rated model outputs per task on a 10-point Likert scale, judging clinical utility. Comprehensive statistical analyses were conducted. RESULTS: A total of 2,240 expert ratings were obtained. GastroGPT achieved significantly higher mean overall scores (8.1 ± 1.8) compared with GPT-4 (5.2 ± 3.0), Bard (5.7 ± 3.3), and Claude (7.0 ± 2.7) (all P < 0.001). It outperformed comparators in six of seven tasks ( P < 0.05), except follow-up planning. GastroGPT demonstrated superior score consistency (variance 34.95) versus general models (97.4-260.35) ( P < 0.001). Its performance remained consistent across case complexities and frequencies, unlike the comparators ( P < 0.001). Multivariate analysis revealed that model type significantly predicted performance ( P < 0.001). CONCLUSIONS: This study pioneered development and comparison of a specialty-specific, clinically-oriented AI model to general-purpose LLMs. GastroGPT demonstrated superior utility overall and on key gastroenterology tasks, highlighting the potential for tailored, task-focused AI models in medicine.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc25020640
003      
CZ-PrNML
005      
20251014150312.0
007      
ta
008      
251007e20250806gw f 000 0|eng||
009      
AR
024    7_
$a 10.1055/a-2637-2163 $2 doi
035    __
$a (PubMed)40860687
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a gw
100    1_
$a Simsek, Cem $u Gastroenterology & Hepatology, Johns Hopkins Medical Institutions Campus, Baltimore, United States
245    10
$a GastroGPT: Development and controlled testing of a proof-of-concept customized clinical language model / $c C. Simsek, M. Ucdal, E. de-Madaria, A. Ebigbo, P. Vanek, O. Elshaarawy, TA. Voiosu, G. Antonelli, R. Turró, JP. Gisbert, OP. Nyssen, C. Hassan, H. Messmann, R. Jalan
520    9_
$a BACKGROUND AND STUDY AIMS: Current general-purpose artificial intelligence (AI) large language models (LLMs) demonstrate limited efficacy in clinical medicine, often constrained to question-answering, documentation, and literature summarization roles. We developed GastroGPT, a proof-of-concept specialty-specific, multi-task, clinical LLM, and evaluated its performance against leading general-purpose LLMs across key gastroenterology tasks and diverse case scenarios. METHODS: In this structured analysis, GastroGPT was compared with three state-of-the-art general-purpose LLMs (LLM-A: GPT-4, LLM-B: Bard, LLM-C: Claude). Models were assessed on seven clinical tasks and overall performance across 10 simulated gastroenterology cases varying in complexity, frequency, and patient demographics. Standardized prompts facilitated structured comparisons. A blinded expert panel rated model outputs per task on a 10-point Likert scale, judging clinical utility. Comprehensive statistical analyses were conducted. RESULTS: A total of 2,240 expert ratings were obtained. GastroGPT achieved significantly higher mean overall scores (8.1 ± 1.8) compared with GPT-4 (5.2 ± 3.0), Bard (5.7 ± 3.3), and Claude (7.0 ± 2.7) (all P < 0.001). It outperformed comparators in six of seven tasks ( P < 0.05), except follow-up planning. GastroGPT demonstrated superior score consistency (variance 34.95) versus general models (97.4-260.35) ( P < 0.001). Its performance remained consistent across case complexities and frequencies, unlike the comparators ( P < 0.001). Multivariate analysis revealed that model type significantly predicted performance ( P < 0.001). CONCLUSIONS: This study pioneered development and comparison of a specialty-specific, clinically-oriented AI model to general-purpose LLMs. GastroGPT demonstrated superior utility overall and on key gastroenterology tasks, highlighting the potential for tailored, task-focused AI models in medicine.
590    __
$a NEINDEXOVÁNO
655    _2
$a časopisecké články $7 D016428
700    1_
$a Ucdal, Mete $u internal medicine, Hacettepe University Faculty of Medicine, Ankara, Turkey
700    1_
$a de-Madaria, Enrique $u Dr Balmis General University Hospital, Alicante, Spain
700    1_
$a Ebigbo, Alanna $u Division of Gastroenterology, Universitätsklinikum Augsburg, Augsburg, Germany
700    1_
$a Vanek, Petr $u Palacky University Olomouc, Olomouc, Czech Republic
700    1_
$a Elshaarawy, Omar $u Liverpool University Hospitals NHS Foundation Trust, Liverpool, United Kingdom of Great Britain and Northern Ireland $u National Liver Institute, Shebeen El-Kom, Egypt
700    1_
$a Voiosu, Theodor Alexandru $u Gastroenterology, Colentina Hospital, Bucharest, Romania
700    1_
$a Antonelli, Giulio $u Sapienza University of Rome, Digestive and Liver Disease Unit, Azienda Ospedaliera Sant'Andrea, Roma, Italy $1 https://orcid.org/0000000317973864
700    1_
$a Turró, Román $u Endoscopy Unit,, Teknon Medical Center, Barcelona, Spain
700    1_
$a Gisbert, Javier P $u Division of Gastroenterology, Faculty of Medicine, Hospital Universitario de la Princesa, Madrid, Spain
700    1_
$a Nyssen, Olga P $u Hospital Universitario de la Princesa, Madrid, Spain
700    1_
$a Hassan, Cesare $u Digestive Endoscopy Unit, Humanitas Research Hospital Department of Gastroenterology, Milan, Italy
700    1_
$a Messmann, Helmut $u Division of Gastroenterology, Universitätsklinikum Augsburg, Augsburg, Germany
700    1_
$a Jalan, Rajiv $u University College Hospital London Medical School, London, United Kingdom of Great Britain and Northern Ireland
773    0_
$w MED00200138 $t Endoscopy international open $x 2364-3722 $g Roč. 13 (20250806), s. a26372163
856    41
$u https://pubmed.ncbi.nlm.nih.gov/40860687 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20251007 $b ABA008
991    __
$a 20251014150318 $b ABA008
999    __
$a ok $b bmc $g 2410875 $s 1258796
BAS    __
$a 3
BAS    __
$a PreBMC-PubMed-not-MEDLINE
BMC    __
$a 2025 $b 13 $c - $d a26372163 $e 20250806 $i 2364-3722 $m Endoscopy international open $n Endosc Int Open $x MED00200138
LZP    __
$a Pubmed-20251007

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...