Understanding metric-related pitfalls in image analysis validation

. 2024 Feb ; 21 (2) : 182-194. [epub] 20240212

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid38347140

Grantová podpora
213038 Wellcome Trust - United Kingdom
UH3 CA225021 NCI NIH HHS - United States
203148 Wellcome Trust - United Kingdom
U01 CA242871 NCI NIH HHS - United States
U24 CA279629 NCI NIH HHS - United States
R01 NS042645 NINDS NIH HHS - United States
P41 GM135019 NIGMS NIH HHS - United States
Wellcome Trust - United Kingdom
U24 CA215109 NCI NIH HHS - United States
EP-W-17-011 EPA - United States CEP - Centrální evidence projektů
U24 CA180924 NCI NIH HHS - United States

Odkazy

PubMed 38347140
PubMed Central PMC11181963
DOI 10.1038/s41592-023-02150-0
PII: 10.1038/s41592-023-02150-0
Knihovny.cz E-zdroje

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.

Allen Institute for Cell Science Seattle WA USA

ARTORG Center for Biomedical Engineering Research University of Bern Bern Switzerland

Cell Biology and Biophysics Unit European Molecular Biology Laboratory Heidelberg Germany

Center for Biomedical Image Computing and Analytics University of Pennsylvania Philadelphia PA USA

Center for Biomedical Informatics and Information Technology National Cancer Institute Bethesda MD USA

Centre for Biomedical Image Analysis and Faculty of Informatics Masaryk University Brno Czech Republic

Centre for Intelligent Machines and MILA McGill University Montréal Quebec Canada

Centre for Medical Image Computing University College London London UK

Department AIBE Friedrich Alexander Universität Erlangen Nürnberg Germany

Department of Biomedical Data Sciences Leiden University Medical Center Leiden the Netherlands

Department of Biomedical Informatics Stony Brook University Health Science Center Stony Brook NY USA

Department of Computer Science IT University of Copenhagen Copenhagen Denmark

Department of Computer Science University of Toronto Toronto Ontario Canada

Department of Computing Faculty of Engineering Imperial College London London UK

Department of Computing Imperial College London South Kensington Campus London UK

Department of Development and Regeneration and EPI centre KU Leuven Leuven Belgium

Department of Digital Medical Technologies Holon Institute of Technology Holon Israel

Department of General Visceral and Thoracic Surgery University Medical Center Hamburg Eppendorf Hamburg Germany

Department of Medical Biophysics University of Toronto Toronto Ontario Canada

Department of Pathology Radboud University Medical Center Nijmegen the Netherlands

Department of Quantitative Biomedicine University of Zurich Zurich Switzerland

Department of Radiation Oncology University Hospital Bern University of Bern Bern Switzerland

Department of Radiology and Institute for Biomedical Informatics University of Pennsylvania Philadelphia PA USA

Department of Radiology and Nuclear Medicine Radboud University Medical Center Nijmegen the Netherlands

Department of Surgery Perelman School of Medicine Philadelphia PA USA

Department of Surgery University Health Network Philadelphia PA USA

Division of Computational Pathology Dept of Pathology and Laboratory Medicine Indiana University School of Medicine Indianapolis IN USA

Electrical Engineering Vanderbilt University Nashville TN USA

European Federation for Medical Informatics Le Mont sur Lausanne Switzerland

Faculty of Mathematics and Computer Science Heidelberg University Heidelberg Germany

Faculty of Medicine Heidelberg University Hospital Heidelberg Germany

Frankfurt Cancer Insititute Frankfurt am Main Germany

Fraunhofer MEVIS Bremen Germany

General Robotics Automation Sensing and Perception Laboratory School of Engineering and Applied Science University of Pennsylvania Philadelphia PA USA

German Cancer Consortium partner site Frankfurt Mainz a partnership between DKFZ and UCT Frankfurt Marburg Frankfurt am Main Germany

German Cancer Research Center Heidelberg Division of Biostatistics Heidelberg Germany

German Cancer Research Center Heidelberg Division of Intelligent Medical Systems Heidelberg Germany

German Cancer Research Center Heidelberg Division of Medical Image Computing Heidelberg Germany

German Cancer Research Center Heidelberg Heidelberg Germany

German Cancer Research Center Heidelberg HI Applied Computer Vision Lab Heidelberg Germany

German Cancer Research Center Heidelberg HI Helmholtz Imaging Heidelberg Germany

German Cancer Research Center Heidelberg Interactive Machine Learning Group Heidelberg Germany

Goethe University Frankfurt Department of Informatics Frankfurt am Main Germany

Goethe University Frankfurt Department of Medicine Frankfurt am Main Germany

Google Health Google Palo Alto CA USA

Helmholtz AI Oberschleißheim Germany

IHU Strasbourg Strasbourg France

Imaging Platform Broad Institute of MIT and Harvard Cambridge MA USA

Informatics Institute Faculty of Science University of Amsterdam Amsterdam the Netherlands

Information Systems Institute University of Applied Sciences Western Switzerland Sierre Switzerland

INSERM Paris France

Institute for Computational Biomedicine Heidelberg University Heidelberg Germany

Institute of Information Systems Engineering TU Wien Vienna Austria

Instituto de Cálculo CONICET Universidad de Buenos Aires Buenos Aires Argentina

Instituto de Investigación en Ciencias de la Computación CONICET UBA Ciudad Autónoma de Buenos Aires Buenos Aires Argentina

Julius Center for Health Sciences and Primary Care UMC Utrecht Utrecht University Utrecht the Netherlands

Laboratoire Traitement du Signal et de l'Image UMR_S 1099 Université de Rennes 1 Rennes France

Leibniz Institut für Analytische Wissenschaften ISAS e 5 Dortmund Germany

Lunit Seoul South Korea

Max Delbrück Center for Molecular Medicine in the Helmholtz Association Biomedical Image Analysis and HI Helmholtz Imaging Berlin Germany

Medical Faculty University of Geneva Geneva Switzerland

MILA Montréal Quebec Canada

MRC Unit for Lifelong Health and Ageing at UCL and Centre for Medical Image Computing Department of Computer Science University College London London UK

National Center for Tumor Diseases NCT Heidelberg a partnership between DKFZ and University Medical Center Heidelberg Heidelberg Germany

National Institute of Allergy and Infectious Diseases Bethesda MD USA

National Institutes of Health Clinical Center Bethesda MD USA

Neurocenter Oulu Oulu University Hospital Oulu Finland

NVIDIA GmbH München Germany

Parietal project team INRIA Saclay Île de France Palaiseau France

Pattern Analysis and Learning Group Department of Radiation Oncology Heidelberg University Hospital Heidelberg Germany

Physical Sciences Sunnybrook Research Institute Toronto Ontario Canada

Princess Margaret Cancer Centre University Health Network Toronto Ontario Canada

Radboud Institute for Health Sciences Radboud University Medical Center Nijmegen the Netherlands

Research Unit of Health Sciences and Technology Faculty of Medicine University of Oulu Oulu Finland

School of Biomedical Engineering and Imaging Science King's College London London UK

School of Computer Science and Engineering University of New South Wales UNSW Sydney Kensington New South Wales Australia

School of Engineering The University of Edinburgh Edinburgh Scotland

Simula Metropolitan Center for Digital Engineering Oslo Norway

Tissue Image Analytics Laboratory Department of Computer Science University of Warwick Coventry UK

Translational Image guided Oncology University Medicine Essen Essen Germany

UiT The Arctic University of Norway Tromsø Norway

Universitat Pompeu Fabra Barcelona Spain

University of Adelaide Adelaide South Australia Australia

University of Potsdam Digital Engineering Faculty Potsdam Germany

Vector Institute for Artificial Intelligence Toronto Ontario Canada

Před aktualizací

PubMed

Zobrazit více v PubMed

Bilic Patrick, Christ Patrick, Li Hongwei Bran, Vorontsov Eugene, Ben-Cohen Avi, Kaissis Georgios, Szeskin Adi, Jacobs Colin, Mamani Gabriel Efrain Humpire, Chartrand Gabriel, et al. The liver tumor segmentation benchmark (lits). Medical Image Analysis, 84:102680, 2023. PubMed PMC

Brown Bernice B. Delphi process: a methodology used for the elicitation of opinions of experts. Technical report, Rand Corp Santa Monica CA, 1968.

Carbonell Alberto, De la Pena Marcos, Flores Ricardo, and Gago Selma. Effects of the trinucleotide preceding the self-cleavage site on eggplant latent viroid hammerheads: differences in co-and post-transcriptional self-cleavage may explain the lack of trinucleotide auc in most natural hammerheads. Nucleic acids research, 34(19):5613–5622, 2006. PubMed PMC

Chen Jianxu, Ding Liya, Viana Matheus P, Lee HyeonWoo, Sluezwski M Filip, Morris Benjamin, Hendershott Melissa C, Yang Ruian, Mueller Irina A, and Rafelski Susanne M. The allen cell and structure segmenter: a new open source toolkit for segmenting 3d intracellular structures in fluorescence microscopy images. BioRxiv, page 491035, 2020.

Chicco Davide and Jurman Giuseppe. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21(1):1–13, 2020. PubMed PMC

Chicco Davide, Tötsch Niklas, and Jurman Giuseppe. The matthews correlation coefficient (mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData mining, 14(1):1–22, 2021. The manuscript addresses the challenge of evaluating binary classifications. It compares MCC to other metrics, explaining their mathematical relationships and providing use cases where MCC offers more informative results. PubMed PMC

Cordts Marius, Omran Mohamed, Ramos Sebastian, Scharwächter Timo, Enzweiler Markus, Benenson Rodrigo, Franke Uwe, Roth Stefan, and Schiele Bernt. The cityscapes dataset. In CVPR Workshop on The Future of Datasets in Vision, 2015.

Correia Paulo and Pereira Fernando. Video object relevance metrics for overall segmentation quality evaluation. EURASIP Journal on Advances in Signal Processing, 2006:1–11, 2006.

Sabatino Antonio Di and Corazza Gino Roberto. Nonceliac gluten sensitivity: sense or sensibility?, 2012. PubMed

Everingham Mark, Luc Van Gool, Williams Christopher KI, Winn John, and Zisserman Andrew. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.

Gooding Mark J, Smith Annamarie J, Tariq Maira, Aljabar Paul, Peressutti Devis, van der Stoep Judith, Reymen Bart, Emans Daisy, Hattu Djoya, van Loon Judith, et al. Comparative evaluation of autocontouring in clinical practice: a practical method using the turing test. Medical physics, 45(11):5105–5115, 2018. PubMed

Gooding Mark J, Boukerroui Djamal, Osorio Eliana Vasquez, Monshouwer René, and Brunenberg Ellen. Multicenter comparison of measures for quantitative evaluation of contouring in radiotherapy. Physics and Imaging in Radiation Oncology, 24:152–158, 2022. PubMed PMC

Grandini Margherita, Bagli Enrico, and Visani Giorgio. Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756, 2020.

Gruber Sebastian and Buettner Florian. Trustworthy deep learning via proper calibration errors: A unifying approach for quantifying the reliability of predictive uncertainty. arXiv preprint arXiv:2203.07835, 2022.

Honauer Katrin, Maier-Hein Lena, and Kondermann Daniel. The hci stereo metrics: Geometry-aware performance analysis of stereo algorithms. In Proceedings of the IEEE International Conference on Computer Vision, pages 2120–2128, 2015.

Kaggle. Satorius Cell Instance Segmentation 2021. https://www.kaggle.com/c/sartorius-cell-instance-segmentation, 2021. [Online; accessed 25-April-2022].

Kofler Florian, Ezhov Ivan, Isensee Fabian, Berger Christoph, Korner Maximilian, Paetzold Johannes, Li Hongwei, Shit Suprosanna, McKinley Richard, Bakas Spyridon, et al. Are we using appropriate segmentation metrics? Identi- fying correlates of human expert perception for CNN training beyond rolling the DICE coefficient. arXiv preprint arXiv:2103.06205v1, 2021.

Konukoglu Ender, Glocker Ben, Ye Dong Hye, Criminisi Antonio, and Pohl Kilian M. Discriminative segmentation-based evaluation through shape dissimilarity. IEEE transactions on medical imaging, 31(12):2278–2289, 2012. PubMed PMC

Lennerz Jochen K, Green Ursula, Williamson Drew FK, and Mahmood Faisal. A unifying force for the realization of medical ai. npj Digital Medicine, 5(1):1–3, 2022. PubMed PMC

Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C Lawrence. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.

Maier-Hein Lena, Eisenmann Matthias, Reinke Annika, Onogur Sinan, Stankovic Marko, Scholz Patrick, Arbel Tal, Bogunovic Hrvoje, Bradley Andrew P, Carass Aaron, et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nature communications, 9(1):1–13, 2018. With this comprehensive analysis of biomedical image analysis competitions (challenges), the authors initiated a shift in how such challenges are designed, performed, and reported in the biomedical domain. Its concepts and guidelines have been adopted by reputed organizations such as MICCAI. PubMed PMC

Maier-Hein Lena, Reinke Annika, Christodoulou Evangelia, Glocker Ben, Godau Patrick, Isensee Fabian, Kleesiek Jens, Kozubek Michal, Reyes Mauricio, Riegler Michael A, et al. Metrics reloaded: Pitfalls and recommendations for image analysis validation. arXiv preprint arXiv:2206.01653, 2022.

Margolin Ran, Zelnik-Manor Lihi, and Tal Ayellet. How to evaluate foreground maps? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 248–255, 2014.

Muschelli John. Roc and auc with a binary predictor: a potentially misleading metric. Journal of classification, 37(3): 696–708, 2020. PubMed PMC

Nasa Prashant, Jain Ravi, and Juneja Deven. Delphi methodology in healthcare research: how to decide its appropriate- ness. World Journal of Methodology, 11(4):116, 2021. PubMed PMC

Ounkomol Chawin, Seshamani Sharmishtaa, Maleckar Mary M, Collman Forrest, and Johnson Gregory R. Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy. Nature methods, 15(11): 917–920, 2018. PubMed PMC

Reinke Annika, Eisenmann Matthias, Tizabi Minu D, Sudre Carole H, Rädsch Tim, Antonelli Michela, Arbel Tal, Bakas Spyridon, Cardoso M Jorge, Cheplygina Veronika, Farahani Keyvan, Glocker Ben, Heckmann-Nötzel Doreen, Isensee Fabian, Jannin Pierre, Kahn Charles, Kleesiek Jens, Kurc Tahsin, Kozubek Michal, Landman Bennett A, Litjens Geert, Maier-Hein Klaus, Martel Anne L, Müller Henning, Petersen Jens, Reyes Mauricio, Rieke Nicola, Stieltjes Bram, Summers Ronald M, Tsaftaris Sotirios A, van Ginneken Bram, Kopp-Schneider Annette, Jäger Paul, and Maier-Hein Lena. Common limitations of image processing metrics: A picture story. arXiv preprint arXiv:2104.05642, 2021.

Reinke Annika, Eisenmann Matthias, Tizabi Minu D, Sudre Carole H, Rädsch Tim, Antonelli Michela, Arbel Tal, Bakas Spyridon, Cardoso M Jorge, Cheplygina Veronika, et al. Common limitations of image processing metrics: A picture story. arXiv preprint arXiv:2104.05642, 2021.

Roberts Brock, Haupt Amanda, Tucker Andrew, Grancharova Tanya, Arakaki Joy, Fuqua Margaret A, Nelson Angelique, Hookway Caroline, Ludmann Susan A, Mueller Irina A, et al. Systematic gene tagging using crispr/cas9 in human stem cells to illuminate cell organization. Molecular biology of the cell, 28(21):2854–2874, 2017. PubMed PMC

Schmidt Uwe, Weigert Martin, Broaddus Coleman, and Myers Gene. Cell detection with star-convex polygons. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 265–273. Springer, 2018.

Stringer Carsen, Wang Tim, Michaelos Michalis, and Pachitariu Marius. Cellpose: a generalist algorithm for cellular segmentation. Nature methods, 18(1):100–106, 2021. PubMed

Taha Abdel Aziz and Hanbury Allan. Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC medical imaging, 15(1):1–28, 2015. The paper discusses the importance of effective metrics for evaluating the accuracy of 3D medical image segmentation algorithms. The authors analyze existing metrics, propose a selection methodology, and develop a tool to aid researchers in choosing appropriate evaluation metrics based on the specific characteristics of the segmentation task. PubMed PMC

Taha Abdel Aziz, Hanbury Allan, and Jimenez del Toro Oscar A. A formal method for selecting evaluation metrics for image segmentation. In 2014 IEEE international conference on image processing (ICIP), pages 932–936. IEEE, 2014.

Tran Thuy Nuong, Adler Tim, Yamlahi Amine, Christodoulou Evangelia, Godau Patrick, Reinke Annika, Tizabi Minu Dietlinde, Sauer Peter, Persicke Tillmann, Albert Jörg Gerhard, et al. Sources of performance variability in deep learning- based polyp detection. arXiv preprint arXiv:2211.09708, 2022. PubMed PMC

Vaassen Femke, Hazelaar Colien, Vaniqui Ana, Gooding Mark, Brent van der Heyden, Richard Canters, and Wouter van Elmpt. Evaluation of measures for assessing time-saving of automatic organ-at-risk segmentation in radiotherapy. Physics and Imaging in Radiation Oncology, 13:1–6, 2020. PubMed PMC

Viana Matheus P, Chen Jianxu, Knijnenburg Theo A, Vasan Ritvik, Yan Calysta, Arakaki Joy E, Bailey Matte, Berry Ben, Borensztejn Antoine, Brown Eva M, et al. Integrated intracellular organization and its variations in human ips cells. Nature, pages 1–10, 2023. PubMed PMC

Wiesenfarth Manuel, Reinke Annika, Landman Bennett A, Eisenmann Matthias, Saiz Laura Aguilera, Cardoso M Jorge, Maier-Hein Lena, and Kopp-Schneider Annette. Methods and open-source toolkit for analyzing and visualizing challenge results. Scientific Reports, 11(1):1–15, 2021. PubMed PMC

Yeghiazaryan Varduhi and Voiculescu Irina D. Family of boundary overlap metrics for the evaluation of medical image segmentation. Journal of Medical Imaging, 5(1):015006, 2018. PubMed PMC

Hirling Dominik, Tasnadi Ervin, Caicedo Juan, Caroprese Maria V, Sjögren Rickard, Aubreville Marc, Koos Krisztian, and Horvath Peter. Segmentation metric misinterpretations in bioimage analysis. Nature methods, pages 1–4, 2023. PubMed PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Metrics reloaded: recommendations for image analysis validation

. 2024 Feb ; 21 (2) : 195-212. [epub] 20240212

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...