• Je něco špatně v tomto záznamu ?

How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images

Z. Zhang, X. Zhang, K. Ichiji, I. Bukovský, N. Homma

. 2023 ; 13 (1) : 19049. [pub] 20231103

Jazyk angličtina Země Anglie, Velká Británie

Typ dokumentu časopisecké články, práce podpořená grantem

Perzistentní odkaz   https://www.medvik.cz/link/bmc24000740

Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods' (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilities or developing approaches to avoid the impact of inter-category imbalance (ICI), which means a difference in data quantity among categories. However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, known as the issue of intra-source imbalance (ISI) characteristic. This imbalance also impacts the performance of DLMs but receives negligible attention. In this study, we study the impact of the ISI on DLMs by comparison of the version of a deep learning model that was trained separately by an intra-source imbalanced chest X-ray (CXR) dataset and an intra-source balanced CXR dataset for COVID-19 diagnosis. The finding is that using the intra-source imbalanced dataset causes a serious training bias, although the dataset has a good inter-category balance. In contrast, the deep learning model performed a reliable diagnosis when trained on the intra-source balanced dataset. Therefore, our study reports clear evidence that the intra-source balance is vital for training data to minimize the risk of poor performance of DLMs.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc24000740
003      
CZ-PrNML
005      
20240213093340.0
007      
ta
008      
240109s2023 enk f 000 0|eng||
009      
AR
024    7_
$a 10.1038/s41598-023-45368-w $2 doi
035    __
$a (PubMed)37923762
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Zhang, Zhang $u Graduate School of Biomedical Engineering, Tohoku University, Sendai, 980-8576, Japan. zhangzhang@dc.tohoku.ac.jp
245    10
$a How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images / $c Z. Zhang, X. Zhang, K. Ichiji, I. Bukovský, N. Homma
520    9_
$a Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods' (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilities or developing approaches to avoid the impact of inter-category imbalance (ICI), which means a difference in data quantity among categories. However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, known as the issue of intra-source imbalance (ISI) characteristic. This imbalance also impacts the performance of DLMs but receives negligible attention. In this study, we study the impact of the ISI on DLMs by comparison of the version of a deep learning model that was trained separately by an intra-source imbalanced chest X-ray (CXR) dataset and an intra-source balanced CXR dataset for COVID-19 diagnosis. The finding is that using the intra-source imbalanced dataset causes a serious training bias, although the dataset has a good inter-category balance. In contrast, the deep learning model performed a reliable diagnosis when trained on the intra-source balanced dataset. Therefore, our study reports clear evidence that the intra-source balance is vital for training data to minimize the risk of poor performance of DLMs.
650    _2
$a lidé $7 D006801
650    12
$a COVID-19 $x diagnostické zobrazování $7 D000086382
650    _2
$a testování na COVID-19 $7 D000086742
650    12
$a deep learning $7 D000077321
650    _2
$a rentgenové záření $7 D014965
650    _2
$a hrudník $7 D013909
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Zhang, Xiaoyong $u Department of General Engineering, National Institute of Technology, Sendai College, Sendai, 989-3128, Japan $u Institute of Development, Aging and Cancer, Tohoku University, Sendai, 980-8576, Japan
700    1_
$a Ichiji, Kei $u Tohoku University Graduate School of Medicine, Tohoku University, Sendai, 980-8576, Japan
700    1_
$a Bukovský, Ivo $u Department of Computer Science, Faculty of Science, University of South Bohemia in Ceske Budejovice, 370 05, Ceske Budejovice, Czech Republic
700    1_
$a Homma, Noriyasu $u Graduate School of Biomedical Engineering, Tohoku University, Sendai, 980-8576, Japan $u Institute of Development, Aging and Cancer, Tohoku University, Sendai, 980-8576, Japan $u Tohoku University Graduate School of Medicine, Tohoku University, Sendai, 980-8576, Japan
773    0_
$w MED00182195 $t Scientific reports $x 2045-2322 $g Roč. 13, č. 1 (2023), s. 19049
856    41
$u https://pubmed.ncbi.nlm.nih.gov/37923762 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y - $z 0
990    __
$a 20240109 $b ABA008
991    __
$a 20240213093337 $b ABA008
999    __
$a ok $b bmc $g 2049394 $s 1210434
BAS    __
$a 3
BAS    __
$a PreBMC-MEDLINE
BMC    __
$a 2023 $b 13 $c 1 $d 19049 $e 20231103 $i 2045-2322 $m Scientific reports $n Sci Rep $x MED00182195
LZP    __
$a Pubmed-20240109

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...