Inconsistency between Human Observation and Deep Learning Models: Assessing Validity of Postmortem Computed Tomography Diagnosis of Drowning

. 2024 Jun ; 37 (3) : 1-10. [epub] 20240209

Jazyk angličtina Země Švýcarsko Médium print-electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid38336949

Grantová podpora
JP18K19892 Japan Society for the Promotion of Science
JP19H04479 Japan Society for the Promotion of Science
JP20K08012 Japan Society for the Promotion of Science

Odkazy

PubMed 38336949
PubMed Central PMC11169324
DOI 10.1007/s10278-024-00974-6
PII: 10.1007/s10278-024-00974-6
Knihovny.cz E-zdroje

Drowning diagnosis is a complicated process in the autopsy, even with the assistance of autopsy imaging and the on-site information from where the body was found. Previous studies have developed well-performed deep learning (DL) models for drowning diagnosis. However, the validity of the DL models was not assessed, raising doubts about whether the learned features accurately represented the medical findings observed by human experts. In this paper, we assessed the medical validity of DL models that had achieved high classification performance for drowning diagnosis. This retrospective study included autopsy cases aged 8-91 years who underwent postmortem computed tomography between 2012 and 2021 (153 drowning and 160 non-drowning cases). We first trained three deep learning models from a previous work and generated saliency maps that highlight important features in the input. To assess the validity of models, pixel-level annotations were created by four radiological technologists and further quantitatively compared with the saliency maps. All the three models demonstrated high classification performance with areas under the receiver operating characteristic curves of 0.94, 0.97, and 0.98, respectively. On the other hand, the assessment results revealed unexpected inconsistency between annotations and models' saliency maps. In fact, each model had, respectively, around 30%, 40%, and 80% of irrelevant areas in the saliency maps, suggesting the predictions of the DL models might be unreliable. The result alerts us in the careful assessment of DL tools, even those with high classification performance.

Zobrazit více v PubMed

Status of drowning in South-East Asia: Country reports. World Health Organization (WHO). https://www.who.int/publications/i/item/9789290210115. Accessed December 15, 2022.

Vander Plaetsen S, De Letter E, Piette M, Van Parys G, Casselman JW, Verstraete K. Post-mortem evaluation of drowning with whole body CT. Forensic science international. 2015;249:35–41. doi: 10.1016/j.forsciint.2015.01.008. PubMed DOI

Christe A, Aghayev E, Jackowski C, Thali MJ, Vock P. Drowning—post-mortem imaging findings by computed tomography. European radiology. 2008;18:283–290. doi: 10.1007/s00330-007-0745-4. PubMed DOI

Usui A, Kawasumi Y, Funayama M, Saito H. Postmortem lung features in drowning cases on computed tomography. Japanese journal of radiology. 2014;32:414–420. doi: 10.1007/s11604-014-0326-9. PubMed DOI

Homma N, Zhang X, Qureshi A, Konno T, Kawasumi Y, Usui A, Funayama M, Bukovsky I, Ichiji K, Sugita N, Yoshizawa M: A deep learning aided drowning diagnosis for forensic investigations using post-mortem lung CT images. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, pp.1262–1265. 10.1109/EMBC44109.2020.9175731, Jul 20, 2020. PubMed

Zeng Y, Zhang X, Kawasumi Y, Usui A, Ichiji K, Funayama M, Homma N: Deep learning-based interpretable computer-aided diagnosis of drowning for forensic radiology. In 2021 60th Annual Conference of the Society of Instrument and Control Engineers of Japan, pp. 820–824, Sep 8, 2021.

Ogawara T, Usui A, Homma N, Funayama M. Diagnosing drowning in postmortem CT images using artificial intelligence. The Tohoku Journal of Experimental Medicine. 2023;259(1):65–75. doi: 10.1620/tjem.2022.J097. PubMed DOI

Sadre R, Sundaram B, Majumdar S, Ushizima D. Validating deep learning inference during chest X-ray classification for COVID-19 screening. Scientific reports. 2021;11(1):16075. doi: 10.1038/s41598-021-95561-y. PubMed DOI PMC

Bae J, Yu S, Oh J, Kim TH, Chung JH, Byun H, Yoon MS, Ahn C, Lee DK. External validation of deep learning algorithm for detecting and visualizing femoral neck fracture including displaced and non-displaced fracture on plain X-ray. Journal of Digital Imaging. 2021;34(5):1099–1109. doi: 10.1007/s10278-021-00499-2. PubMed DOI PMC

Singh V, Danda V, Gorniak R, Flanders A, Lakhani P. Assessment of critical feeding tube malpositions on radiographs using deep learning. Journal of digital imaging. 2019;32:651–655. doi: 10.1007/s10278-019-00229-9. PubMed DOI PMC

Erten M, Tuncer I, Barua PD, Yildirim K, Dogan S, Tuncer T, Tan RS, Fujita H, Acharya UR. Automated urine cell image classification model using chaotic mixer deep feature extraction. Journal of Digital Imaging. 2023;2:1–2. PubMed PMC

Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, Chang GH, Joshi AS, Dwyer B, Zhu S, Kaku M. Development and validation of an interpretable deep learning framework for Alzheimer’s disease classification. Brain. 2020;143(6):1920–1933. doi: 10.1093/brain/awaa137. PubMed DOI PMC

Liu H, Li L, Wormstone IM, Qiao C, Zhang C, Liu P, Li S, Wang H, Mou D, Pang R, Yang D. Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA ophthalmology. 2019;137(12):1353–1360. doi: 10.1001/jamaophthalmol.2019.3501. PubMed DOI PMC

Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D: Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision, pp. 618–626. 10.48550/arXiv.1610.02391, 2017.

Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Proceedings of Computer Vision–ECCV, pp. 818–833, September 6–12, 2014.

Singh A, Sengupta S, Lakshminarayanan V. Explainable deep learning models in medical image analysis. Journal of Imaging. 2020;6(6):52. doi: 10.3390/jimaging6060052. PubMed DOI PMC

Zeng Y, Zhang X, Kawasumi Y, Usui A, Ichiji K, Funayama M, Homma N: A 2.5D deep learning-based method for drowning diagnosis using post-mortem computed tomography. IEEE Journal of Biomedical and Health Informatics 27(2):1026–1035, 2023. PubMed

Arun N, Gaw N, Singh P, Chang K, Aggarwal M, Chen B, Hoebel K, Gupta S, Patel J, Gidwani M, Adebayo J: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiology: Artificial Intelligence 3(6): e200267, 2021. PubMed PMC

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Communications of the ACM. 2017;60(6):84–90. doi: 10.1145/3065386. DOI

Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, Sep 4, 2014.

Szegedy C, Ioffe A, Vanhoucke V, Alemi AA: Inception-v4, inception-resnet and the impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence, pp. 4278–4284, 2017.

Ribeiro MT, Singh S, Guestrin C: Why should I trust you?" Explaining the predictions of any classifier: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.1135–1144, 2016.

Lundberg SM, Lee SI: A unified approach to interpreting model predictions. Advances in neural information processing systems (NIPS) 30, 2017.

Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M: Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, Dec 21, 2014

Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. IEEE winter conference on applications of computer vision (WACV),pp. 839–847, 2018.

Reyes M, Meier R, Pereira S, Silva CA, Dahlweid FM, Tengg-Kobligk HV, Summers RM, Wiest R: On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiology: artificial intelligence 27;2(3):e190043, 2020. PubMed PMC

Wang H, Wang Z, Du M, Yang F, Zhang Z, Ding S, Mardziel P, Hu X: Score-CAM: Score-weighted visual explanations for convolutional neural networks. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 24–33, 2020.

Wada K. Labelme: Image Polygonal Annotation with Python. https://github.com/wkentaro/labelme

Armato SG, III, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical physics. 2011;38(2):915–931. doi: 10.1118/1.3528204. PubMed DOI PMC

Boggust A, Hoover B, Satyanarayan A, Strobelt H: Shared interest: Measuring human-AI alignment to identify recurring patterns in model behavior. Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–17, 2022.

Hoiem D, Chodpathumwan Y, Dai Q: Diagnosing error in object detectors. In European conference on computer vision, pp. 340–353, Oct 7, 2012.

Redmon J, Divvala S, Girshick R, Farhadi A: You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788. 2016.

Otsu N. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics. 1979;9(1):62–66. doi: 10.1109/TSMC.1979.4310076. DOI

Hausman NL, Javed N, Bednar MK, Guell M, Schaller E, Nevill RE, Kahng S. Interobserver consistency: A preliminary investigation into how much is enough? Journal of applied behavior analysis. 2022;55(2):357–368. doi: 10.1002/jaba.811. PubMed DOI

Amgad M, Atteya LA, Hussein H, Mohammed KH, Hafiz E, Elsebaie MA, Alhusseiny AM, AlMoslemany MA, Elmatboly AM, Pappalardo PA, Sakr RA. NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. Giga Science. 2022;11:1–12. doi: 10.1093/gigascience/giac037. PubMed DOI PMC

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...