• Something wrong with this record ?

Improving machine learning-based bitewing segmentation with synthetic data

E. Tolstaya, A. Tichy, S. Paris, F. Schwendicke

. 2025 ; 156 (-) : 105679. [pub] 20250309

Language English Country England, Great Britain

Document type Journal Article

OBJECTIVES: Class imbalance in datasets is one of the challenges of machine learning (ML) in medical image analysis. We employed synthetic data to overcome class imbalance when segmenting bitewing radiographs as an exemplary task for using ML. METHODS: After segmenting bitewings into classes, i.e. dental structures, restorations, and background, the pixel-level representation of implants in the training set (1543 bitewings) and testing set (177 bitewings) was 0.03 % and 0.07 %, respectively. A diffusion model and a generative adversarial network (pix2pix) were used to generate a dataset synthetically enriched in implants. A U-Net segmentation model was trained on (1) the original dataset, (2) the synthetic dataset, (3) on the synthetic dataset and fine-tuned on the original dataset, or (4) on a dataset which was naïvely oversampled with images containing implants. RESULTS: U-Net trained on the original dataset was unable to segment implants in the testing set. Model performance was significantly improved by naïve over-sampling, achieving the highest precision. The model trained only on synthetic data performed worse than naïve over-sampling in all metrics, but with fine-tuning on original data, it resulted in the highest Dice score, recall, F1 score and ROC AUC, respectively. The performance on other classes than implants was similar for all strategies except training only on synthetic data, which tended to perform worse. CONCLUSIONS: The use of synthetic data alone may deteriorate the performance of segmentation models. However, fine-tuning on original data could significantly enhance model performance, especially for heavily underrepresented classes. CLINICAL SIGNIFICANCE: This study explored the use of synthetic data to enhance segmentation of bitewing radiographs, focusing on underrepresented classes like implants. Pre-training on synthetic data followed by fine-tuning on original data yielded the best results, highlighting the potential of synthetic data to advance AI-driven dental imaging and ultimately support clinical decision-making.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc25015924
003      
CZ-PrNML
005      
20250731091348.0
007      
ta
008      
250708e20250309enk f 000 0|eng||
009      
AR
024    7_
$a 10.1016/j.jdent.2025.105679 $2 doi
035    __
$a (PubMed)40068717
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Tolstaya, Ekaterina $u Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Goethestraße 70, 80 336, Munich, Germany
245    10
$a Improving machine learning-based bitewing segmentation with synthetic data / $c E. Tolstaya, A. Tichy, S. Paris, F. Schwendicke
520    9_
$a OBJECTIVES: Class imbalance in datasets is one of the challenges of machine learning (ML) in medical image analysis. We employed synthetic data to overcome class imbalance when segmenting bitewing radiographs as an exemplary task for using ML. METHODS: After segmenting bitewings into classes, i.e. dental structures, restorations, and background, the pixel-level representation of implants in the training set (1543 bitewings) and testing set (177 bitewings) was 0.03 % and 0.07 %, respectively. A diffusion model and a generative adversarial network (pix2pix) were used to generate a dataset synthetically enriched in implants. A U-Net segmentation model was trained on (1) the original dataset, (2) the synthetic dataset, (3) on the synthetic dataset and fine-tuned on the original dataset, or (4) on a dataset which was naïvely oversampled with images containing implants. RESULTS: U-Net trained on the original dataset was unable to segment implants in the testing set. Model performance was significantly improved by naïve over-sampling, achieving the highest precision. The model trained only on synthetic data performed worse than naïve over-sampling in all metrics, but with fine-tuning on original data, it resulted in the highest Dice score, recall, F1 score and ROC AUC, respectively. The performance on other classes than implants was similar for all strategies except training only on synthetic data, which tended to perform worse. CONCLUSIONS: The use of synthetic data alone may deteriorate the performance of segmentation models. However, fine-tuning on original data could significantly enhance model performance, especially for heavily underrepresented classes. CLINICAL SIGNIFICANCE: This study explored the use of synthetic data to enhance segmentation of bitewing radiographs, focusing on underrepresented classes like implants. Pre-training on synthetic data followed by fine-tuning on original data yielded the best results, highlighting the potential of synthetic data to advance AI-driven dental imaging and ultimately support clinical decision-making.
650    12
$a strojové učení $7 D000069550
650    _2
$a lidé $7 D006801
650    _2
$a zubní implantáty $7 D015921
650    12
$a počítačové zpracování obrazu $x metody $7 D007091
655    _2
$a časopisecké články $7 D016428
700    1_
$a Tichy, Antonin $u Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Goethestraße 70, 80 336, Munich, Germany; Institute of Dental Medicine, First Faculty of Medicine of the Charles University and General University Hospital in Prague, Karlovo namesti 32, 121 11, Prague, Czech Republic
700    1_
$a Paris, Sebastian $u Operative and Preventive Dentistry, Charité - Universitätsmedizin Berlin, Assmannshauser Straße 4-6, 14197 Berlin, Germany
700    1_
$a Schwendicke, Falk $u Department of Conservative Dentistry and Periodontology, LMU University Hospital, LMU Munich, Goethestraße 70, 80 336, Munich, Germany. Electronic address: Falk.Schwendicke@med.uni-muenchen.de
773    0_
$w MED00002631 $t Journal of dentistry $x 1879-176X $g Roč. 156 (20250309), s. 105679
856    41
$u https://pubmed.ncbi.nlm.nih.gov/40068717 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20250708 $b ABA008
991    __
$a 20250731091342 $b ABA008
999    __
$a ok $b bmc $g 2366635 $s 1253049
BAS    __
$a 3
BAS    __
$a PreBMC-MEDLINE
BMC    __
$a 2025 $b 156 $c - $d 105679 $e 20250309 $i 1879-176X $m Journal of dentistry $n J Dent $x MED00002631
LZP    __
$a Pubmed-20250708

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...