-
Je něco špatně v tomto záznamu ?
Image-Based Diagnostic Performance of LLMs vs CNNs for Oral Lichen Planus: Example-Guided and Differential Diagnosis
P. Rewthamrongsris, J. Burapacheep, E. Phattarataratip, P. Kulthanaamondhita, A. Tichy, F. Schwendicke, T. Osathanon, K. Sappayatosok
Jazyk angličtina Země Anglie, Velká Británie
Typ dokumentu časopisecké články, srovnávací studie
- MeSH
- diagnóza počítačová * metody MeSH
- diferenciální diagnóza MeSH
- jazyk (prostředek komunikace) * MeSH
- lichen planus orální * diagnóza diagnostické zobrazování patologie MeSH
- lidé MeSH
- neuronové sítě * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- srovnávací studie MeSH
INTRODUCTION AND AIMS: The overlapping characteristics of oral lichen planus (OLP), a chronic oral mucosal inflammatory condition, with those of other oral lesions, present diagnostic challenges. Large language models (LLMs) with integrated computer-vision capabilities and convolutional neural networks (CNNs) constitute an alternative diagnostic modality. We evaluated the ability of seven LLMs, including both proprietary and open-source models, to detect OLP from intraoral images and generate differential diagnoses. METHODS: Using a dataset with 1,142 clinical photographs of histopathologically confirmed OLP, non-OLP lesions, and normal mucosa. The LLMs were tested using three experimental designs: zero-shot recognition, example-guided recognition, and differential diagnosis. Performance was measured using accuracy, precision, recall, F1-score, and discounted cumulative gain (DCG). Furthermore, the performance of LLMs was compared with three previously published CNN-based models for OLP detection on a subset of 110 photographs, which were previously used to test the CNN models. RESULTS: Gemini 1.5 Pro and Flash demonstrated the highest accuracy (69.69%) in zero-shot recognition, whereas GPT-4o ranked first in the F1 score (76.10%). With example-guided prompts, which improved consistency and reduced refusal rates, Gemini 1.5 Flash achieved the highest accuracy (80.53%) and F1-score (84.54%); however, Claude 3.5 Sonnet achieved the highest DCG score of 0.63. Although the proprietary models generally excelled, the open-source Llama model demonstrated notable strengths in ranking relevant diagnoses despite moderate performance in detection tasks. All LLMs were outperformed by the CNN models. CONCLUSION: The seven evaluated LLMs lack sufficient performance for clinical use. CNNs trained to detect OLP outperformed the LLMs tested in this study.
College of Dental Medicine Rangsit University Pathum Thani Thailand
Department of Computer Science Stanford University Stanford California USA
Department of Conservative Dentistry and Periodontology LMU University Hospital LMU Munich Germany
Department of Oral Pathology Faculty of Dentistry Chulalongkorn University Bangkok Thailand
Institute of Dental Medicine 1st Faculty of Medicine Charles University Prague Czech Republic
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc25022183
- 003
- CZ-PrNML
- 005
- 20251023080122.0
- 007
- ta
- 008
- 251014s2025 enk f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1016/j.identj.2025.100848 $2 doi
- 035 __
- $a (PubMed)40482575
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a enk
- 100 1_
- $a Rewthamrongsris, Paak $u Center of Artificial Intelligence and Innovation (CAII) and Center of Excellence for Dental Stem Cell Biology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand; Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Germany
- 245 10
- $a Image-Based Diagnostic Performance of LLMs vs CNNs for Oral Lichen Planus: Example-Guided and Differential Diagnosis / $c P. Rewthamrongsris, J. Burapacheep, E. Phattarataratip, P. Kulthanaamondhita, A. Tichy, F. Schwendicke, T. Osathanon, K. Sappayatosok
- 520 9_
- $a INTRODUCTION AND AIMS: The overlapping characteristics of oral lichen planus (OLP), a chronic oral mucosal inflammatory condition, with those of other oral lesions, present diagnostic challenges. Large language models (LLMs) with integrated computer-vision capabilities and convolutional neural networks (CNNs) constitute an alternative diagnostic modality. We evaluated the ability of seven LLMs, including both proprietary and open-source models, to detect OLP from intraoral images and generate differential diagnoses. METHODS: Using a dataset with 1,142 clinical photographs of histopathologically confirmed OLP, non-OLP lesions, and normal mucosa. The LLMs were tested using three experimental designs: zero-shot recognition, example-guided recognition, and differential diagnosis. Performance was measured using accuracy, precision, recall, F1-score, and discounted cumulative gain (DCG). Furthermore, the performance of LLMs was compared with three previously published CNN-based models for OLP detection on a subset of 110 photographs, which were previously used to test the CNN models. RESULTS: Gemini 1.5 Pro and Flash demonstrated the highest accuracy (69.69%) in zero-shot recognition, whereas GPT-4o ranked first in the F1 score (76.10%). With example-guided prompts, which improved consistency and reduced refusal rates, Gemini 1.5 Flash achieved the highest accuracy (80.53%) and F1-score (84.54%); however, Claude 3.5 Sonnet achieved the highest DCG score of 0.63. Although the proprietary models generally excelled, the open-source Llama model demonstrated notable strengths in ranking relevant diagnoses despite moderate performance in detection tasks. All LLMs were outperformed by the CNN models. CONCLUSION: The seven evaluated LLMs lack sufficient performance for clinical use. CNNs trained to detect OLP outperformed the LLMs tested in this study.
- 650 12
- $a lichen planus orální $x diagnóza $x diagnostické zobrazování $x patologie $7 D017676
- 650 _2
- $a lidé $7 D006801
- 650 _2
- $a diferenciální diagnóza $7 D003937
- 650 12
- $a neuronové sítě $7 D016571
- 650 12
- $a jazyk (prostředek komunikace) $7 D007802
- 650 12
- $a diagnóza počítačová $x metody $7 D003936
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a srovnávací studie $7 D003160
- 700 1_
- $a Burapacheep, Jirayu $u Department of Computer Science, Stanford University, Stanford, California, USA
- 700 1_
- $a Phattarataratip, Ekarat $u Department of Oral Pathology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- 700 1_
- $a Kulthanaamondhita, Promphakkon $u College of Dental Medicine, Rangsit University, Pathum Thani, Thailand
- 700 1_
- $a Tichy, Antonin $u Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Germany; Institute of Dental Medicine, First Faculty of Medicine, Charles University, Prague, Czech Republic
- 700 1_
- $a Schwendicke, Falk $u Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Germany
- 700 1_
- $a Osathanon, Thanaphum $u Center of Artificial Intelligence and Innovation (CAII) and Center of Excellence for Dental Stem Cell Biology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- 700 1_
- $a Sappayatosok, Kraisorn $u College of Dental Medicine, Rangsit University, Pathum Thani, Thailand. Electronic address: kraisorn.s@rsu.ac.th
- 773 0_
- $w MED00002275 $t International dental journal $x 1875-595X $g Roč. 75, č. 4 (2025), s. 100848
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/40482575 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y - $z 0
- 990 __
- $a 20251014 $b ABA008
- 991 __
- $a 20251023080128 $b ABA008
- 999 __
- $a ok $b bmc $g 2417149 $s 1260346
- BAS __
- $a 3
- BAS __
- $a PreBMC-MEDLINE
- BMC __
- $a 2025 $b 75 $c 4 $d 100848 $e 20250606 $i 1875-595X $m International dental journal $n Int Dent J $x MED00002275
- LZP __
- $a Pubmed-20251014