-
Something wrong with this record ?
Inconsistency between Human Observation and Deep Learning Models: Assessing Validity of Postmortem Computed Tomography Diagnosis of Drowning
Y. Zeng, X. Zhang, J. Wang, A. Usui, K. Ichiji, I. Bukovsky, S. Chou, M. Funayama, N. Homma
Language English Country Switzerland
Document type Journal Article
Grant support
JP18K19892
Japan Society for the Promotion of Science
JP19H04479
Japan Society for the Promotion of Science
JP20K08012
Japan Society for the Promotion of Science
- MeSH
- Deep Learning * MeSH
- Child MeSH
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Adolescent MeSH
- Young Adult MeSH
- Autopsy * methods MeSH
- Tomography, X-Ray Computed * methods MeSH
- Postmortem Imaging MeSH
- Reproducibility of Results MeSH
- Retrospective Studies MeSH
- ROC Curve MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Drowning * diagnosis MeSH
- Check Tag
- Child MeSH
- Adult MeSH
- Middle Aged MeSH
- Humans MeSH
- Adolescent MeSH
- Young Adult MeSH
- Male MeSH
- Aged, 80 and over MeSH
- Aged MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
Drowning diagnosis is a complicated process in the autopsy, even with the assistance of autopsy imaging and the on-site information from where the body was found. Previous studies have developed well-performed deep learning (DL) models for drowning diagnosis. However, the validity of the DL models was not assessed, raising doubts about whether the learned features accurately represented the medical findings observed by human experts. In this paper, we assessed the medical validity of DL models that had achieved high classification performance for drowning diagnosis. This retrospective study included autopsy cases aged 8-91 years who underwent postmortem computed tomography between 2012 and 2021 (153 drowning and 160 non-drowning cases). We first trained three deep learning models from a previous work and generated saliency maps that highlight important features in the input. To assess the validity of models, pixel-level annotations were created by four radiological technologists and further quantitatively compared with the saliency maps. All the three models demonstrated high classification performance with areas under the receiver operating characteristic curves of 0.94, 0.97, and 0.98, respectively. On the other hand, the assessment results revealed unexpected inconsistency between annotations and models' saliency maps. In fact, each model had, respectively, around 30%, 40%, and 80% of irrelevant areas in the saliency maps, suggesting the predictions of the DL models might be unreliable. The result alerts us in the careful assessment of DL tools, even those with high classification performance.
Faculty of Science University of South Bohemia in Ceske Budejovice Ceske Budejovice Czech Republic
Mechanical Engineering Czech Technical University Prague Prague Czech Republic
National Institute of Technology Sendai College Sendai Japan
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc24013785
- 003
- CZ-PrNML
- 005
- 20240905134421.0
- 007
- ta
- 008
- 240725s2024 sz f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1007/s10278-024-00974-6 $2 doi
- 035 __
- $a (PubMed)38336949
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a sz
- 100 1_
- $a Zeng, Yuwen $u Department of Radiological Imaging and Informatics, Tohoku University Graduate School of Medicine, Sendai, Japan. yuwen@tohoku.ac.jp $1 https://orcid.org/0000000336157766
- 245 10
- $a Inconsistency between Human Observation and Deep Learning Models: Assessing Validity of Postmortem Computed Tomography Diagnosis of Drowning / $c Y. Zeng, X. Zhang, J. Wang, A. Usui, K. Ichiji, I. Bukovsky, S. Chou, M. Funayama, N. Homma
- 520 9_
- $a Drowning diagnosis is a complicated process in the autopsy, even with the assistance of autopsy imaging and the on-site information from where the body was found. Previous studies have developed well-performed deep learning (DL) models for drowning diagnosis. However, the validity of the DL models was not assessed, raising doubts about whether the learned features accurately represented the medical findings observed by human experts. In this paper, we assessed the medical validity of DL models that had achieved high classification performance for drowning diagnosis. This retrospective study included autopsy cases aged 8-91 years who underwent postmortem computed tomography between 2012 and 2021 (153 drowning and 160 non-drowning cases). We first trained three deep learning models from a previous work and generated saliency maps that highlight important features in the input. To assess the validity of models, pixel-level annotations were created by four radiological technologists and further quantitatively compared with the saliency maps. All the three models demonstrated high classification performance with areas under the receiver operating characteristic curves of 0.94, 0.97, and 0.98, respectively. On the other hand, the assessment results revealed unexpected inconsistency between annotations and models' saliency maps. In fact, each model had, respectively, around 30%, 40%, and 80% of irrelevant areas in the saliency maps, suggesting the predictions of the DL models might be unreliable. The result alerts us in the careful assessment of DL tools, even those with high classification performance.
- 650 _2
- $a lidé $7 D006801
- 650 12
- $a deep learning $7 D000077321
- 650 12
- $a utonutí $x diagnóza $7 D004332
- 650 _2
- $a senioři $7 D000368
- 650 _2
- $a dítě $7 D002648
- 650 _2
- $a senioři nad 80 let $7 D000369
- 650 12
- $a počítačová rentgenová tomografie $x metody $7 D014057
- 650 _2
- $a mladiství $7 D000293
- 650 _2
- $a dospělí $7 D000328
- 650 12
- $a pitva $x metody $7 D001344
- 650 _2
- $a lidé středního věku $7 D008875
- 650 _2
- $a ženské pohlaví $7 D005260
- 650 _2
- $a retrospektivní studie $7 D012189
- 650 _2
- $a mužské pohlaví $7 D008297
- 650 _2
- $a mladý dospělý $7 D055815
- 650 _2
- $a ROC křivka $7 D012372
- 650 _2
- $a reprodukovatelnost výsledků $7 D015203
- 650 _2
- $a posmrtné zobrazování $7 D000097873
- 655 _2
- $a časopisecké články $7 D016428
- 700 1_
- $a Zhang, Xiaoyong $u National Institute of Technology, Sendai College, Sendai, Japan
- 700 1_
- $a Wang, Jiaoyang $u Department of Intelligent Biomedical System Engineering, Graduate School of Biomedical Engineering, Tohoku University, Sendai, Japan
- 700 1_
- $a Usui, Akihito $u Department of Radiological Imaging and Informatics, Tohoku University Graduate School of Medicine, Sendai, Japan
- 700 1_
- $a Ichiji, Kei $u Department of Radiological Imaging and Informatics, Tohoku University Graduate School of Medicine, Sendai, Japan
- 700 1_
- $a Bukovsky, Ivo $u Faculty of Science, University of South Bohemia in Ceske Budejovice, Ceske Budejovice, Czech Republic $u Mechanical Engineering, Czech Technical University in Prague, Prague, Czech Republic
- 700 1_
- $a Chou, Shuoyan $u Department of Industrial Management, National Taiwan University of Science and Technology, Taipei, Taiwan
- 700 1_
- $a Funayama, Masato $u Department of Radiological Imaging and Informatics, Tohoku University Graduate School of Medicine, Sendai, Japan
- 700 1_
- $a Homma, Noriyasu $u Department of Radiological Imaging and Informatics, Tohoku University Graduate School of Medicine, Sendai, Japan
- 773 0_
- $w MED00215148 $t Journal of imaging informatics in medicine $x 2948-2933 $g Roč. 37, č. 3 (2024), s. 1-10
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/38336949 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y - $z 0
- 990 __
- $a 20240725 $b ABA008
- 991 __
- $a 20240905134415 $b ABA008
- 999 __
- $a ok $b bmc $g 2143537 $s 1225651
- BAS __
- $a 3
- BAS __
- $a PreBMC-MEDLINE
- BMC __
- $a 2024 $b 37 $c 3 $d 1-10 $e 20240209 $i 2948-2933 $m Journal of imaging informatics in medicine $n J Imaging Inform Med $x MED00215148
- GRA __
- $a JP18K19892 $p Japan Society for the Promotion of Science
- GRA __
- $a JP19H04479 $p Japan Society for the Promotion of Science
- GRA __
- $a JP20K08012 $p Japan Society for the Promotion of Science
- LZP __
- $a Pubmed-20240725