-
Je něco špatně v tomto záznamu ?
How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images
Z. Zhang, X. Zhang, K. Ichiji, I. Bukovský, N. Homma
Jazyk angličtina Země Anglie, Velká Británie
Typ dokumentu časopisecké články, práce podpořená grantem
NLK
Directory of Open Access Journals
od 2011
Free Medical Journals
od 2011
Nature Open Access
od 2011-12-01
PubMed Central
od 2011
Europe PubMed Central
od 2011
ProQuest Central
od 2011-01-01
Open Access Digital Library
od 2011-01-01
Open Access Digital Library
od 2011-01-01
Health & Medicine (ProQuest)
od 2011-01-01
ROAD: Directory of Open Access Scholarly Resources
od 2011
Springer Nature OA/Free Journals
od 2011-12-01
- MeSH
- COVID-19 * diagnostické zobrazování MeSH
- deep learning * MeSH
- hrudník MeSH
- lidé MeSH
- rentgenové záření MeSH
- testování na COVID-19 MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods' (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilities or developing approaches to avoid the impact of inter-category imbalance (ICI), which means a difference in data quantity among categories. However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, known as the issue of intra-source imbalance (ISI) characteristic. This imbalance also impacts the performance of DLMs but receives negligible attention. In this study, we study the impact of the ISI on DLMs by comparison of the version of a deep learning model that was trained separately by an intra-source imbalanced chest X-ray (CXR) dataset and an intra-source balanced CXR dataset for COVID-19 diagnosis. The finding is that using the intra-source imbalanced dataset causes a serious training bias, although the dataset has a good inter-category balance. In contrast, the deep learning model performed a reliable diagnosis when trained on the intra-source balanced dataset. Therefore, our study reports clear evidence that the intra-source balance is vital for training data to minimize the risk of poor performance of DLMs.
Graduate School of Biomedical Engineering Tohoku University Sendai 980 8576 Japan
Institute of Development Aging and Cancer Tohoku University Sendai 980 8576 Japan
Tohoku University Graduate School of Medicine Tohoku University Sendai 980 8576 Japan
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc24000740
- 003
- CZ-PrNML
- 005
- 20240213093340.0
- 007
- ta
- 008
- 240109s2023 enk f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1038/s41598-023-45368-w $2 doi
- 035 __
- $a (PubMed)37923762
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a enk
- 100 1_
- $a Zhang, Zhang $u Graduate School of Biomedical Engineering, Tohoku University, Sendai, 980-8576, Japan. zhangzhang@dc.tohoku.ac.jp
- 245 10
- $a How intra-source imbalanced datasets impact the performance of deep learning for COVID-19 diagnosis using chest X-ray images / $c Z. Zhang, X. Zhang, K. Ichiji, I. Bukovský, N. Homma
- 520 9_
- $a Over the past decade, the use of deep learning has been widely increasing in the medical image diagnosis field. Deep learning-based methods' (DLMs) performance strongly relies on training data. Therefore, researchers often focus on collecting as much data as possible from different medical facilities or developing approaches to avoid the impact of inter-category imbalance (ICI), which means a difference in data quantity among categories. However, due to the ICI within each medical facility, medical data are often isolated and acquired in different settings among medical facilities, known as the issue of intra-source imbalance (ISI) characteristic. This imbalance also impacts the performance of DLMs but receives negligible attention. In this study, we study the impact of the ISI on DLMs by comparison of the version of a deep learning model that was trained separately by an intra-source imbalanced chest X-ray (CXR) dataset and an intra-source balanced CXR dataset for COVID-19 diagnosis. The finding is that using the intra-source imbalanced dataset causes a serious training bias, although the dataset has a good inter-category balance. In contrast, the deep learning model performed a reliable diagnosis when trained on the intra-source balanced dataset. Therefore, our study reports clear evidence that the intra-source balance is vital for training data to minimize the risk of poor performance of DLMs.
- 650 _2
- $a lidé $7 D006801
- 650 12
- $a COVID-19 $x diagnostické zobrazování $7 D000086382
- 650 _2
- $a testování na COVID-19 $7 D000086742
- 650 12
- $a deep learning $7 D000077321
- 650 _2
- $a rentgenové záření $7 D014965
- 650 _2
- $a hrudník $7 D013909
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a práce podpořená grantem $7 D013485
- 700 1_
- $a Zhang, Xiaoyong $u Department of General Engineering, National Institute of Technology, Sendai College, Sendai, 989-3128, Japan $u Institute of Development, Aging and Cancer, Tohoku University, Sendai, 980-8576, Japan
- 700 1_
- $a Ichiji, Kei $u Tohoku University Graduate School of Medicine, Tohoku University, Sendai, 980-8576, Japan
- 700 1_
- $a Bukovský, Ivo $u Department of Computer Science, Faculty of Science, University of South Bohemia in Ceske Budejovice, 370 05, Ceske Budejovice, Czech Republic
- 700 1_
- $a Homma, Noriyasu $u Graduate School of Biomedical Engineering, Tohoku University, Sendai, 980-8576, Japan $u Institute of Development, Aging and Cancer, Tohoku University, Sendai, 980-8576, Japan $u Tohoku University Graduate School of Medicine, Tohoku University, Sendai, 980-8576, Japan
- 773 0_
- $w MED00182195 $t Scientific reports $x 2045-2322 $g Roč. 13, č. 1 (2023), s. 19049
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/37923762 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y - $z 0
- 990 __
- $a 20240109 $b ABA008
- 991 __
- $a 20240213093337 $b ABA008
- 999 __
- $a ok $b bmc $g 2049394 $s 1210434
- BAS __
- $a 3
- BAS __
- $a PreBMC-MEDLINE
- BMC __
- $a 2023 $b 13 $c 1 $d 19049 $e 20231103 $i 2045-2322 $m Scientific reports $n Sci Rep $x MED00182195
- LZP __
- $a Pubmed-20240109