-
Something wrong with this record ?
The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study
G. Mårtensson, D. Ferreira, T. Granberg, L. Cavallin, K. Oppedal, A. Padovani, I. Rektorova, L. Bonanni, M. Pardini, MG. Kramberger, JP. Taylor, J. Hort, J. Snædal, J. Kulisevsky, F. Blanc, A. Antonini, P. Mecocci, B. Vellas, M. Tsolaki, I....
Language English Country Netherlands
Document type Journal Article, Research Support, N.I.H., Extramural, Research Support, Non-U.S. Gov't
Grant support
U01 AG024904
NIA NIH HHS - United States
W81XWH-12-2-0012
Department of Defense - International
U01 AG024904
NIA NIH HHS - United States
- MeSH
- Deep Learning * MeSH
- Humans MeSH
- Magnetic Resonance Imaging MeSH
- Brain diagnostic imaging MeSH
- Neural Networks, Computer MeSH
- Reproducibility of Results MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical datasets-collected with different scanners, protocols and disease populations-and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to datasets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment.
Centre for Age Related Medicine Stavanger University Hospital Stavanger Norway
Centro de Investigación en Red Enfermedades Neurodegenerativas Barcelona Spain
Department of Clinical Neuroscience Karolinska Institutet Stockholm Sweden
Department of Electrical Engineering and Computer Science University of Stavanger Stavanger Norway
Department of Psychiatry Warneford Hospital University of Oxford Oxford UK
Department of Radiology Karolinska University Hospital Stockholm Sweden
Institut d'Investigacions Biomédiques Sant Pau Barcelona Spain
Institute of Clinical Medicine Neurology University of Eastern Finland Finland
Institute of Gerontology and Geriatrics University of Perugia Perugia Italy
Institute of Neuroscience Newcastle University Newcastle upon Tyne UK
Institute of Psychiatry Psychology and Neuroscience King's College London London UK
Landspitali University Hospital Reykjavik Iceland
Medical University of Lodz Lodz Poland
Movement Disorders Unit Neurology Department Sant Pau Hospital Barcelona Spain
Neurocenter Neurology Kuopio University Hospital Kuopio Finland
Neurology Unit Department of Clinical and Experimental Sciences University of Brescia Brescia Italy
NIHR Biomedical Research Centre for Mental Health London UK
NIHR Biomedical Research Unit for Dementia London UK
UMR INSERM 1027 gerontopole CHU University of Toulouse France
Universitat Autónoma de Barcelona Barcelona Spain
University of Strasbourg and French National Centre for Scientific Research ICONE Strasbourg France
References provided by Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc21019756
- 003
- CZ-PrNML
- 005
- 20210830101344.0
- 007
- ta
- 008
- 210728s2020 ne f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1016/j.media.2020.101714 $2 doi
- 035 __
- $a (PubMed)33007638
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a ne
- 100 1_
- $a Mårtensson, Gustav $u Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden. Electronic address: gustav.martensson@ki.se
- 245 14
- $a The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study / $c G. Mårtensson, D. Ferreira, T. Granberg, L. Cavallin, K. Oppedal, A. Padovani, I. Rektorova, L. Bonanni, M. Pardini, MG. Kramberger, JP. Taylor, J. Hort, J. Snædal, J. Kulisevsky, F. Blanc, A. Antonini, P. Mecocci, B. Vellas, M. Tsolaki, I. Kłoszewska, H. Soininen, S. Lovestone, A. Simmons, D. Aarsland, E. Westman
- 520 9_
- $a Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical datasets-collected with different scanners, protocols and disease populations-and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to datasets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment.
- 650 _2
- $a mozek $x diagnostické zobrazování $7 D001921
- 650 12
- $a deep learning $7 D000077321
- 650 _2
- $a lidé $7 D006801
- 650 _2
- $a magnetická rezonanční tomografie $7 D008279
- 650 _2
- $a neuronové sítě $7 D016571
- 650 _2
- $a reprodukovatelnost výsledků $7 D015203
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a Research Support, N.I.H., Extramural $7 D052061
- 655 _2
- $a práce podpořená grantem $7 D013485
- 700 1_
- $a Ferreira, Daniel $u Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
- 700 1_
- $a Granberg, Tobias $u Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden; Department of Radiology, Karolinska University Hospital, Stockholm, Sweden
- 700 1_
- $a Cavallin, Lena $u Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden; Department of Radiology, Karolinska University Hospital, Stockholm, Sweden
- 700 1_
- $a Oppedal, Ketil $u Centre for Age-Related Medicine, Stavanger University Hospital, Stavanger, Norway; Stavanger Medical Imaging Laboratory (SMIL), Department of Radiology, Stavanger University Hospital, Stavanger, Norway; Department of Electrical Engineering and Computer Science, University of Stavanger, Stavanger, Norway
- 700 1_
- $a Padovani, Alessandro $u Neurology Unit, Department of Clinical and Experimental Sciences, University of Brescia, Brescia, Italy
- 700 1_
- $a Rektorova, Irena $u 1st Department of Neurology, Medical Faculty, St. Anne's Hospital and CEITEC, Masaryk University, Brno, Czech Republic
- 700 1_
- $a Bonanni, Laura $u Department of Neuroscience Imaging and Clinical Sciences and CESI, University G d'Annunzio of Chieti-Pescara, Chieti, Italy
- 700 1_
- $a Pardini, Matteo $u Department of Neuroscience (DINOGMI), University of Genoa and Neurology Clinics, Polyclinic San Martino Hospital, Genoa, Italy
- 700 1_
- $a Kramberger, Milica G $u Department of Neurology, University Medical Centre Ljubljana, Medical faculty, University of Ljubljana, Slovenia
- 700 1_
- $a Taylor, John-Paul $u Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
- 700 1_
- $a Hort, Jakub $u Memory Clinic, Department of Neurology, Charles University, 2nd Faculty of Medicine and Motol University Hospital, Prague, Czech Republic
- 700 1_
- $a Snædal, Jón $u Landspitali University Hospital, Reykjavik, Iceland
- 700 1_
- $a Kulisevsky, Jaime $u Movement Disorders Unit, Neurology Department, Sant Pau Hospital, Barcelona, Spain; Institut d'Investigacions Biomédiques Sant Pau (IIB-Sant Pau), Barcelona, Spain; Centro de Investigación en Red-Enfermedades Neurodegenerativas (CIBERNED), Barcelona, Spain; Universitat Autónoma de Barcelona (U.A.B.), Barcelona, Spain
- 700 1_
- $a Blanc, Frederic $u Day Hospital of Geriatrics, Memory Resource and Research Centre (CM2R) of Strasbourg, Department of Geriatrics, Hôpitaux Universitaires de Strasbourg, Strasbourg, France; University of Strasbourg and French National Centre for Scientific Research (CNRS), ICube Laboratory and Fédération de Médecine Translationnelle de Strasbourg (FMTS), Team Imagerie Multimodale Intégrative en Santé (IMIS)/ICONE, Strasbourg, France
- 700 1_
- $a Antonini, Angelo $u Department of Neuroscience, University of Padua, Padua & Fondazione Ospedale San Camillo, Venezia, Venice, Italy
- 700 1_
- $a Mecocci, Patrizia $u Institute of Gerontology and Geriatrics, University of Perugia, Perugia, Italy
- 700 1_
- $a Vellas, Bruno $u UMR INSERM 1027, gerontopole, CHU, University of Toulouse, France
- 700 1_
- $a Tsolaki, Magda $u 3rd Department of Neurology, Memory and Dementia Unit, Aristotle University of Thessaloniki, Thessaloniki, Greece
- 700 1_
- $a Kłoszewska, Iwona $u Medical University of Lodz, Lodz, Poland
- 700 1_
- $a Soininen, Hilkka $u Institute of Clinical Medicine, Neurology, University of Eastern Finland, Finland; Neurocenter, Neurology, Kuopio University Hospital, Kuopio, Finland
- 700 1_
- $a Lovestone, Simon $u Department of Psychiatry, Warneford Hospital, University of Oxford, Oxford, UK
- 700 1_
- $a Simmons, Andrew $u NIHR Biomedical Research Centre for Mental Health, London, UK; NIHR Biomedical Research Unit for Dementia, London, UK; Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- 700 1_
- $a Aarsland, Dag $u Centre for Age-Related Medicine, Stavanger University Hospital, Stavanger, Norway; Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- 700 1_
- $a Westman, Eric $u Division of Clinical Geriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden; Department of Neuroimaging, Centre for Neuroimaging Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- 773 0_
- $w MED00007107 $t Medical image analysis $x 1361-8423 $g Roč. 66, č. - (2020), s. 101714
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/33007638 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y p $z 0
- 990 __
- $a 20210728 $b ABA008
- 991 __
- $a 20210830101344 $b ABA008
- 999 __
- $a ok $b bmc $g 1690546 $s 1140202
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2020 $b 66 $c - $d 101714 $e 20200501 $i 1361-8423 $m Medical image analysis $n Med Image Anal $x MED00007107
- GRA __
- $a U01 AG024904 $p NIA NIH HHS $2 United States
- GRA __
- $a W81XWH-12-2-0012 $p Department of Defense $2 International
- GRA __
- $a U01 AG024904 $p NIA NIH HHS $2 United States
- LZP __
- $a Pubmed-20210728