Detail
Článek
Článek online
FT
Medvik - BMČ
  • Je něco špatně v tomto záznamu ?

Image-Based Diagnostic Performance of LLMs vs CNNs for Oral Lichen Planus: Example-Guided and Differential Diagnosis

P. Rewthamrongsris, J. Burapacheep, E. Phattarataratip, P. Kulthanaamondhita, A. Tichy, F. Schwendicke, T. Osathanon, K. Sappayatosok

. 2025 ; 75 (4) : 100848. [pub] 20250606

Jazyk angličtina Země Anglie, Velká Británie

Typ dokumentu časopisecké články, srovnávací studie

Perzistentní odkaz   https://www.medvik.cz/link/bmc25022183

INTRODUCTION AND AIMS: The overlapping characteristics of oral lichen planus (OLP), a chronic oral mucosal inflammatory condition, with those of other oral lesions, present diagnostic challenges. Large language models (LLMs) with integrated computer-vision capabilities and convolutional neural networks (CNNs) constitute an alternative diagnostic modality. We evaluated the ability of seven LLMs, including both proprietary and open-source models, to detect OLP from intraoral images and generate differential diagnoses. METHODS: Using a dataset with 1,142 clinical photographs of histopathologically confirmed OLP, non-OLP lesions, and normal mucosa. The LLMs were tested using three experimental designs: zero-shot recognition, example-guided recognition, and differential diagnosis. Performance was measured using accuracy, precision, recall, F1-score, and discounted cumulative gain (DCG). Furthermore, the performance of LLMs was compared with three previously published CNN-based models for OLP detection on a subset of 110 photographs, which were previously used to test the CNN models. RESULTS: Gemini 1.5 Pro and Flash demonstrated the highest accuracy (69.69%) in zero-shot recognition, whereas GPT-4o ranked first in the F1 score (76.10%). With example-guided prompts, which improved consistency and reduced refusal rates, Gemini 1.5 Flash achieved the highest accuracy (80.53%) and F1-score (84.54%); however, Claude 3.5 Sonnet achieved the highest DCG score of 0.63. Although the proprietary models generally excelled, the open-source Llama model demonstrated notable strengths in ranking relevant diagnoses despite moderate performance in detection tasks. All LLMs were outperformed by the CNN models. CONCLUSION: The seven evaluated LLMs lack sufficient performance for clinical use. CNNs trained to detect OLP outperformed the LLMs tested in this study.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc25022183
003      
CZ-PrNML
005      
20251023080122.0
007      
ta
008      
251014s2025 enk f 000 0|eng||
009      
AR
024    7_
$a 10.1016/j.identj.2025.100848 $2 doi
035    __
$a (PubMed)40482575
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Rewthamrongsris, Paak $u Center of Artificial Intelligence and Innovation (CAII) and Center of Excellence for Dental Stem Cell Biology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand; Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Germany
245    10
$a Image-Based Diagnostic Performance of LLMs vs CNNs for Oral Lichen Planus: Example-Guided and Differential Diagnosis / $c P. Rewthamrongsris, J. Burapacheep, E. Phattarataratip, P. Kulthanaamondhita, A. Tichy, F. Schwendicke, T. Osathanon, K. Sappayatosok
520    9_
$a INTRODUCTION AND AIMS: The overlapping characteristics of oral lichen planus (OLP), a chronic oral mucosal inflammatory condition, with those of other oral lesions, present diagnostic challenges. Large language models (LLMs) with integrated computer-vision capabilities and convolutional neural networks (CNNs) constitute an alternative diagnostic modality. We evaluated the ability of seven LLMs, including both proprietary and open-source models, to detect OLP from intraoral images and generate differential diagnoses. METHODS: Using a dataset with 1,142 clinical photographs of histopathologically confirmed OLP, non-OLP lesions, and normal mucosa. The LLMs were tested using three experimental designs: zero-shot recognition, example-guided recognition, and differential diagnosis. Performance was measured using accuracy, precision, recall, F1-score, and discounted cumulative gain (DCG). Furthermore, the performance of LLMs was compared with three previously published CNN-based models for OLP detection on a subset of 110 photographs, which were previously used to test the CNN models. RESULTS: Gemini 1.5 Pro and Flash demonstrated the highest accuracy (69.69%) in zero-shot recognition, whereas GPT-4o ranked first in the F1 score (76.10%). With example-guided prompts, which improved consistency and reduced refusal rates, Gemini 1.5 Flash achieved the highest accuracy (80.53%) and F1-score (84.54%); however, Claude 3.5 Sonnet achieved the highest DCG score of 0.63. Although the proprietary models generally excelled, the open-source Llama model demonstrated notable strengths in ranking relevant diagnoses despite moderate performance in detection tasks. All LLMs were outperformed by the CNN models. CONCLUSION: The seven evaluated LLMs lack sufficient performance for clinical use. CNNs trained to detect OLP outperformed the LLMs tested in this study.
650    12
$a lichen planus orální $x diagnóza $x diagnostické zobrazování $x patologie $7 D017676
650    _2
$a lidé $7 D006801
650    _2
$a diferenciální diagnóza $7 D003937
650    12
$a neuronové sítě $7 D016571
650    12
$a jazyk (prostředek komunikace) $7 D007802
650    12
$a diagnóza počítačová $x metody $7 D003936
655    _2
$a časopisecké články $7 D016428
655    _2
$a srovnávací studie $7 D003160
700    1_
$a Burapacheep, Jirayu $u Department of Computer Science, Stanford University, Stanford, California, USA
700    1_
$a Phattarataratip, Ekarat $u Department of Oral Pathology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
700    1_
$a Kulthanaamondhita, Promphakkon $u College of Dental Medicine, Rangsit University, Pathum Thani, Thailand
700    1_
$a Tichy, Antonin $u Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Germany; Institute of Dental Medicine, First Faculty of Medicine, Charles University, Prague, Czech Republic
700    1_
$a Schwendicke, Falk $u Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Germany
700    1_
$a Osathanon, Thanaphum $u Center of Artificial Intelligence and Innovation (CAII) and Center of Excellence for Dental Stem Cell Biology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
700    1_
$a Sappayatosok, Kraisorn $u College of Dental Medicine, Rangsit University, Pathum Thani, Thailand. Electronic address: kraisorn.s@rsu.ac.th
773    0_
$w MED00002275 $t International dental journal $x 1875-595X $g Roč. 75, č. 4 (2025), s. 100848
856    41
$u https://pubmed.ncbi.nlm.nih.gov/40482575 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20251014 $b ABA008
991    __
$a 20251023080128 $b ABA008
999    __
$a ok $b bmc $g 2417149 $s 1260346
BAS    __
$a 3
BAS    __
$a PreBMC-MEDLINE
BMC    __
$a 2025 $b 75 $c 4 $d 100848 $e 20250606 $i 1875-595X $m International dental journal $n Int Dent J $x MED00002275
LZP    __
$a Pubmed-20251014

Najít záznam

Citační ukazatele

Pouze přihlášení uživatelé

Možnosti archivace

Nahrávání dat ...