Metrics reloaded: recommendations for image analysis validation

. 2024 Feb ; 21 (2) : 195-212. [epub] 20240212

Jazyk angličtina Země Spojené státy americké Médium print-electronic

Typ dokumentu časopisecké články, přehledy

Perzistentní odkaz   https://www.medvik.cz/link/pmid38347141

Grantová podpora
UH3 CA225021 NCI NIH HHS - United States
U01 CA242871 NCI NIH HHS - United States
U24 CA279629 NCI NIH HHS - United States
R01 NS042645 NINDS NIH HHS - United States
P41 GM135019 NIGMS NIH HHS - United States
U24 CA215109 NCI NIH HHS - United States
EP-W-17-011 EPA - United States CEP - Centrální evidence projektů
U24 CA180924 NCI NIH HHS - United States

Odkazy

PubMed 38347141
PubMed Central PMC11182665
DOI 10.1038/s41592-023-02151-z
PII: 10.1038/s41592-023-02151-z
Knihovny.cz E-zdroje

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

ARTORG Center for Biomedical Engineering Research University of Bern Bern Switzerland

Australian Institute for Machine Learning AIML University of Adelaide Adelaide South Australia Australia

BCN Medtech Universitat Pompeu Fabra Barcelona Spain

Cell Biology and Biophysics Unit European Molecular Biology Laboratory Heidelberg Germany

Center for Biomedical Image Computing and Analytics University of Pennsylvania Philadelphia PA USA

Center for Biomedical Informatics and Information Technology National Cancer Institute Bethesda MD USA

Center for Processing Speech and Images Department of Electrical Engineering KU Leuven Leuven Belgium

Center for Scalable Data Analytics and Artificial Intelligence Leipzig University Leipzig Germany

Center for Systems Biology Dresden Germany

Centre for Biomedical Image Analysis and Faculty of Informatics Masaryk University Brno Czech Republic

Centre for Intelligent Machines and MILA McGill University Montréal Quebec Canada

Centre for Medical Image Computing University College London London UK

Centre for Statistics in Medicine University of Oxford Nuffield Orthopaedic Centre Oxford UK

Department AIBE Friedrich Alexander Universität Erlangen Nürnberg Germany

Department of Biomedical Data Sciences Leiden University Medical Center Leiden the Netherlands

Department of Biomedical Informatics Stony Brook University Health Science Center Stony Brook NY USA

Department of Computer Science IT University of Copenhagen Copenhagen Denmark

Department of Computer Science UiT The Arctic University of Norway Tromsø Norway

Department of Computer Science University of Toronto Toronto Ontario Canada

Department of Computing Faculty of Engineering Imperial College London London UK

Department of Computing Imperial College London South Kensington Campus London UK

Department of Development and Regeneration and EPI centre KU Leuven Leuven Belgium

Department of Digital Medical Technologies Holon Institute of Technology Holon Israel

Department of General Visceral and Thoracic Surgery University Medical Center Hamburg Eppendorf Hamburg Germany

Department of Informatics Goethe University Frankfurt Frankfurt am Main Germany

Department of Medical Biophysics University of Toronto Toronto Ontario Canada

Department of Medicine Goethe University Frankfurt Frankfurt am Main Germany

Department of Pathology Radboud University Medical Center Nijmegen the Netherlands

Department of Quantitative Biomedicine University of Zurich Zurich Switzerland

Department of Radiation Oncology University Hospital Bern University of Bern Bern Switzerland

Department of Radiology and Institute for Biomedical Informatics University of Pennsylvania Philadelphia PA USA

Department of Radiology and Nuclear Medicine Radboud University Medical Center Nijmegen the Netherlands

Department of Surgery Perelman School of Medicine Philadelphia PA USA

Department of Surgery University Health Network Philadelphia PA USA

Digital Engineering Faculty University of Potsdam Potsdam Germany

Division of Computational Pathology Department of Pathology and Laboratory Medicine Indiana University School of Medicine IU Health Information and Translational Sciences Building Indianapolis IN USA

Electrical Engineering Vanderbilt University Nashville TN USA

European Federation for Medical Informatics Le Mont sur Lausanne Switzerland

Faculty of Mathematics and Computer Science Heidelberg University Heidelberg Germany

Faculty of Medicine Heidelberg University Hospital Heidelberg Germany

Frankfurt Cancer Insititute Frankfurt am Main Germany

Fraunhofer MEVIS Bremen Germany

General Robotics Automation Sensing and Perception Laboratory School of Engineering and Applied Science University of Pennsylvania Philadelphia PA USA

German Cancer Consortium partner site Frankfurt Mainz a partnership between DKFZ and UCT Frankfurt Marburg Frankfurt am Main Germany

German Cancer Research Center Heidelberg Division of Biostatistics Heidelberg Germany

German Cancer Research Center Heidelberg Division of Intelligent Medical Systems Heidelberg Germany

German Cancer Research Center Heidelberg Division of Medical Image Computing Heidelberg Germany

German Cancer Research Center Heidelberg Heidelberg Germany

German Cancer Research Center Heidelberg HI Applied Computer Vision Lab Heidelberg Germany

German Cancer Research Center Heidelberg HI Helmholtz Imaging Heidelberg Germany

German Cancer Research Center Heidelberg Interactive Machine Learning Group Heidelberg Germany

Google 1600 Amphitheatre Pkwy Mountain View CA USA

Google Health DeepMind London UK

Google Health Google Palo Alto CA USA

Helmholtz AI Oberschleißheim Germany

IHU Strasbourg Strasbourg France

Imaging Platform Broad Institute of MIT and Harvard Cambridge MA USA

Informatics Institute Faculty of Science University of Amsterdam Amsterdam the Netherlands

Information Systems Institute University of Applied Sciences Western Switzerland Sierre Switzerland

INSERM Paris France

Institute for AI in Medicine University Medicine Essen Essen Germany

Institute for Computational Biomedicine Heidelberg University Heidelberg Germany

Institute of Information Systems Engineering TU Wien Vienna Austria

Instituto de Cálculo CONICET Universidad de Buenos Aires Buenos Aires Argentina

Instituto de Investigación en Ciencias de la Computación CONICET UBA Ciudad Autónoma de Buenos Aires Buenos Aires Argentina

Julius Center for Health Sciences and Primary Care UMC Utrecht Utrecht University Utrecht the Netherlands

Laboratoire Traitement du Signal et de l'Image UMR_S 1099 Université de Rennes 1 Rennes France

Max Delbrück Center for Molecular Medicine in the Helmholtz Association Biomedical Image Analysis and HI Helmholtz Imaging Berlin Germany

Medical Faculty Heidelberg University Heidelberg Germany

Medical Faculty University of Geneva Geneva Switzerland

MILA Montréal Quebec Canada

MRC Unit for Lifelong Health and Ageing at UCL and Centre for Medical Image Computing Department of Computer Science University College London London UK

National Center for Tumor Diseases NCT Heidelberg a partnership between DKFZ and University Medical Center Heidelberg Heidelberg Germany

National Institutes of Health Clinical Center Bethesda MD USA

Neurocenter Oulu Oulu University Hospital Oulu Finland

NVIDIA München Germany

Parietal project team INRIA Saclay Île de France Palaiseau France

Pattern Analysis and Learning Group Department of Radiation Oncology Heidelberg University Hospital Heidelberg Germany

Physical Sciences Sunnybrook Research Institute Toronto Ontario Canada

Princess Margaret Cancer Centre University Health Network Toronto Ontario Canada

Radboud Institute for Health Sciences Radboud University Medical Center Nijmegen the Netherlands

Research Unit of Health Sciences and Technology Faculty of Medicine University of Oulu Oulu Finland

School of Biomedical Engineering and Imaging Science King's College London London UK

School of Computer Science and Engineering University of New South Wales UNSW Sydney Kensington New South Wales Australia

School of Engineering The University of Edinburgh Edinburgh Scotland

Simula Metropolitan Center for Digital Engineering Oslo Norway

Technische Universität Dresden DFG Cluster of Excellence 'Physics of Life' Dresden Germany

Tissue Image Analytics Laboratory Department of Computer Science University of Warwick Coventry UK

Vector Institute for Artificial Intelligence Toronto Ontario Canada

Zobrazit více v PubMed

Adamson Adewole S and Smith Avery. Machine learning and health care disparities in dermatology, 2018. PubMed

Antonelli Michela, Reinke Annika, Bakas Spyridon, Farahani Keyvan, Kopp-Schneider Annette, Landman Bennett A, Litjens Geert, Menze Bjoern, Ronneberger Olaf, Summers Ronald M, et al. The medical segmentation decathlon. Nature Communications, 13(1):1–13, 2022. PubMed PMC

Armato Samuel G III, McLennan Geoffrey, Bidaut Luc, McNitt-Gray Michael F, Meyer Charles R, Reeves Anthony P, Zhao Binsheng, Aberle Denise R, Henschke Claudia I, Hoffman Eric A, et al. The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics, 38(2):915–931, 2011. PubMed PMC

Badgeley MA, Zech JR, Oakden-Rayner L, Glicksberg BS, Liu M, Gale W, McConnell MV, Percha B, Snyder TM, and Dudley JT. Deep learning predicts hip fracture using confounding patient and healthcare variables. npj digit med. 2019; 2: 31, 2019. PubMed PMC

Birhane Abeba, Kalluri Pratyusha, Card Dallas, Agnew William, Dotan Ravit, and Bao Michelle. The values encoded in machine learning research. arXiv, June 2021.

Bossuyt Patrick M, Reitsma Johannes B, Bruns David E, Gatsonis Constantine A, Glasziou Paul P, Irwig Les M, Lijmer Jeroen G, Moher David, Rennie Drummond, De Vet Henrica CW, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the stard initiative. Annals of internal medicine, 138(1):40–44, 2003. PubMed

Brown Bernice B. Delphi process: a methodology used for the elicitation of opinions of experts. Technical report, Rand Corp; Santa Monica CA, 1968.

Brümmer Niko and Du Preez Johan. Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2–3):230–275, 2006.

Carass Aaron, Roy Snehashis, Gherman Adrian, Reinhold Jacob C, Jesson Andrew, Arbel Tal, Maier Oskar, Handels Heinz, Ghafoorian Mohsen, Platel Bram, et al. Evaluating white matter lesion segmentations with refined sørensen-dice analysis. Scientific reports, 10(1):1–19, 2020. PubMed PMC

Char Danton S, Shah Nigam H, and Magnus David. Implementing machine learning in health care - addressing ethical challenges. N. Engl. J. Med, 378(11):981–983, March 2018. PubMed PMC

Chenouard Nicolas, Smal Ihor, De Chaumont Fabrice, Maška Martin, Sbalzarini Ivo F, Gong Yuanhao, Cardinale Janick, Carthel Craig, Coraluppi Stefano, Winter Mark, et al. Objective comparison of particle tracking methods. Nature methods, 11(3):281–289, 2014. PubMed PMC

Codella Noel, Rotemberg Veronica, Tschandl Philipp, Celebi M Emre, Dusza Stephen, Gutman David, Helba Brian, Kalloo Aadi, Liopyris Konstantinos, Marchetti Michael, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.

Collins Gary S, Dhiman Paula, Andaur Navarro Constanza L, Ma Jie, Hooft Lotty, Reitsma Johannes B, Logullo Patricia, Beam Andrew L, Peng Lily, Van Calster Ben, et al. Protocol for development of a reporting guideline (tripod-ai) and risk of bias tool (probast-ai) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ open, 11(7):e048008, 2021. PubMed PMC

Commowick Olivier, Istace Audrey, Kain Michael, Laurent Baptiste, Leray Florent, Simon Mathieu, Pop Sorina Camarasu, Girard Pascal, Ameli Roxana, Ferré Jean-Christophe, et al. Objective evaluation of multiple sclerosis lesion segmentation using a data management and processing infrastructure. Scientific reports, 8(1):1–17, 2018. PubMed PMC

CONSORT-AI and SPIRIT-AI Steering Group. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat. Med, 25(10):1467–1468, October 2019. PubMed

Correia Paulo and Pereira Fernando. Video object relevance metrics for overall segmentation quality evaluation. EURASIP Journal on Advances in Signal Processing, 2006:1–11, 2006.

Côté Marc-Alexandre, Girard Gabriel, Boré Arnaud, Garyfallidis Eleftherios, Houde Jean-Christophe, and Descoteaux Maxime. Tractometer: towards validation of tractography pipelines. Medical Image Analysis, 17(7):844–857, October 2013. ISSN 1361–8423. doi: 10.1016/j.media.2013.03.009. PubMed DOI

D’Amour A, Heller K, Moldovan D, Adlam B, and others. Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv, 2020.

Université de Montréal. The Declaration - Montreal Responsible AI, 2017. URL https://www.montrealdeclaration-responsibleai.com/the-declaration.

Ellis David G, Alvarez Carlos M, and Aizenberg Michele R. Qualitative criteria for feasible cranial implant designs. In Cranial Implant Design Challenge, pages 8–18. Springer, 2021.

Ferrer Luciana. Analysis and comparison of classification metrics. arXiv preprint arXiv:2209.05355, 2022.

Geirhos Robert, Jacobsen Jörn-Henrik, Michaelis Claudio, Zemel Richard, Brendel Wieland, Bethge Matthias, and Wichmann Felix A. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, November 2020.

Gooding Mark J, Smith Annamarie J, Tariq Maira, Aljabar Paul, Peressutti Devis, van der Stoep Judith, Reymen Bart, Emans Daisy, Hattu Djoya, van Loon Judith, et al. Comparative evaluation of autocontouring in clinical practice: a practical method using the turing test. Medical physics, 45(11):5105–5115, 2018. PubMed

Gruber Sebastian Gregor and Buettner Florian. Better uncertainty calibration via proper scores for classification and beyond. In Advances in Neural Information Processing Systems, 2022.

Haugen Trine B, Hicks Steven A, Andersen Jorunn M, Witczak Oliwia, Hammer Hugo L, Borgli Rune, Halvorsen Pål, and Riegler Michael. Visem: A multimodal video dataset of human spermatozoa. In Proceedings of the 10th ACM Multimedia Systems Conference, pages 261–266, 2019.

Honauer Katrin, Maier-Hein Lena, and Kondermann Daniel. The hci stereo metrics: Geometry-aware performance analysis of stereo algorithms. In Proceedings of the IEEE International Conference on Computer Vision, pages 2120–2128, 2015.

Ibrahim Hussein, Liu Xiaoxuan, Zariffa Nevine, Morris Andrew D, and Denniston Alastair K. Health data poverty: an assailable barrier to equitable digital health care. Lancet Digit Health, 3(4):e260–e265, April 2021. PubMed

Jaeger Paul F, Lüth Carsten T, Klein Lukas, and Bungert Till J. A call to reflect on evaluation practices for failure detection in image classification. International Conference on Learning Representations, 2023.

Jäger Paul Ferdinand. Challenges and opportunities of end-to-end learning in medical image classification. Karlsruher Institut für Technologie, 2020.

Jannin Pierre. Towards responsible research in digital technology for health care. arXiv, September 2021.

Kang Feng, Jin Rong, and Sukthankar Rahul. Correlated label propagation with application to multi-label learning. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1719–1726. IEEE, 2006.

Kelly Christopher J, Karthikesalingam Alan, Suleyman Mustafa, Corrado Greg, and King Dominic. Key challenges for delivering clinical impact with artificial intelligence. BMC medicine, 17:1–9, 2019. PubMed PMC

Khan Daanish Ali, Li Linhong, Sha Ninghao, Liu Zhuoran, Jimenez Abelino, Raj Bhiksha, and Singh Rita. Non-determinism in neural networks for adversarial robustness. arXiv preprint arXiv:1905.10906, 2019.

Kirillov Alexander, He Kaiming, Girshick Ross, Rother Carsten, and Dollár Piotr. Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9404–9413, 2019.

Kofler Florian, Ezhov Ivan, Isensee Fabian, Berger Christoph, Korner Maximilian, Paetzold Johannes, Li Hongwei, Shit Suprosanna, McKinley Richard, Bakas Spyridon, et al. Are we using appropriate segmentation metrics? Identifying correlates of human expert perception for CNN training beyond rolling the DICE coefficient. arXiv preprint arXiv:2103.06205v1, 2021.

Kofler Florian, Shit Suprosanna, Ezhov Ivan, Fidon Lucas, Al-Maskari Rami, Li Hongwei, Bhatia Harsharan, Loehr Timo, Piraud Marie, Erturk Ali, et al. blob loss: instance imbalance aware loss functions for semantic segmentation. arXiv preprint arXiv:2205.08209, 2022.

Konukoglu Ender, Glocker Ben, Ye Dong Hye, Criminisi Antonio, and Pohl Kilian M. Discriminative segmentation-based evaluation through shape dissimilarity. IEEE transactions on medical imaging, 31(12):2278–2289, 2012. PubMed PMC

Kottner Jan, Audigé Laurent, Brorson Stig, Donner Allan, Gajewski Byron J, Hróbjartsson Asbjørn, Roberts Chris, Shoukri Mohamed, and Streiner David L. Guidelines for reporting reliability and agreement studies (grras) were proposed. International journal of nursing studies, 48(6):661–671, 2011. PubMed

Lacoste Alexandre, Luccioni Alexandra, Schmidt Victor, and Dandres Thomas. Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700, 2019.

Lannelongue Loïc, Grealey Jason, and Inouye Michael. Green algorithms: quantifying the carbon footprint of computation. Advanced science, 8(12):2100707, 2021. PubMed PMC

Lavin Alexander, Gilligan-Lee Ciarán M, Visnjic Alessya, Ganju Siddha, Newman Dava, Ganguly Sujoy, Lange Danny, Baydin Atílím Güneş, Sharma Amit, Gibson Adam, et al. Technology readiness levels for machine learning systems. Nature Communications, 13(1):1–19, 2022. PubMed PMC

van Leeuwen David A and Brümmer Niko. An introduction to application-independent evaluation of speaker recognition systems. In Speaker classification I, pages 330–353. Springer, 2007.

Lennerz Jochen K, Green Ursula, Williamson Drew FK, and Mahmood Faisal. A unifying force for the realization of medical ai. npj Digital Medicine, 5(1):1–3, 2022. PubMed PMC

Liang Kung-Yee and Zeger Scott L. Longitudinal data analysis using generalized linear models. Biometrika, 73(1):13–22, 1986.

Liu Xiaoqi, Parks Kelsey, Saknite Inga, Reasat Tahsin, Cronin Austin D, Wheless Lee E, Dawant Benoit M, and Tkaczyk Eric R. Baseline photos and confident annotation improve automated detection of cutaneous graft-versus-host disease. Clinical hematology international, 3(3):108, 2021. PubMed PMC

Ljosa Vebjorn, Sokolnicki Katherine L, and Carpenter Anne E. Annotated high-throughput microscopy image sets for validation. Nature methods, 9(7):637–637, 2012. PubMed PMC

Maier-Hein Lena, Eisenmann Matthias, Reinke Annika, Onogur Sinan, Stankovic Marko, Scholz Patrick, Arbel Tal, Bogunovic Hrvoje, Bradley Andrew P, Carass Aaron, et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nature communications, 9(1):1–13, 2018. PubMed PMC

Maier-Hein Lena, Wagner Martin, Ross Tobias, Reinke Annika, Bodenstedt Sebastian, Full Peter M, Hempe Hellena, Mindroc-Filimon Diana, Scholz Patrick, Tran Thuy Nuong, et al. Heidelberg colorectal data set for surgical data science in the sensor operating room. Scientific data, 8(1):1–11, 2021. PubMed PMC

Maier-Hein Lena, Reinke Annika, Christodoulou Evangelia, Glocker Ben, Godau Patrick, Isensee Fabian, Kleesiek Jens, Kozubek Michal, Reyes Mauricio, Riegler Michael A, et al. Metrics reloaded: Pitfalls and recommendations for image analysis validation. arXiv preprint arXiv:2206.01653, 2022.

Mais Lisa, Hirsch Peter, and Kainmueller Dagmar. Patchperpix for instance segmentation. In European Conference on Computer Vision, pages 288–304. Springer, 2020.

Margolin Ran, Zelnik-Manor Lihi, and Tal Ayellet. How to evaluate foreground maps? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 248–255, 2014.

McCradden Melissa D, Anderson James A, Stephenson Elizabeth A, Drysdale Erik, Erdman Lauren, Goldenberg Anna, and Shaul Randi Zlotnik. A research ethics framework for the clinical translation of healthcare machine learning. Am. J. Bioeth, pages 1–15, January 2022. PubMed

Meilă Marina. Comparing clusterings by the variation of information. In Learning theory and kernel machines, pages 173–187. Springer, 2003.

Meissner G, Nern A, Dorman Z, DePasquale GM, Forster K, Gibney T, Hausenfluck JH, He Y, Iyer N, Jeter J, et al. A searchable image resource of drosophila gal4-driver expression patterns with single neuron resolution. BioRxiv, page 2020.05.29.080473, 2022. PubMed PMC

Moons Karel GM, Altman Douglas G, Reitsma Johannes B, Ioannidis John PA, Macaskill Petra, Steyerberg Ewout W, Vickers Andrew J, Ransohoff David F, and Collins Gary S. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): explanation and elaboration. Annals of internal medicine, 162(1):W1–W73, 2015. PubMed

Nagao Yukiko, Sakamoto Mika, Chinen Takumi, Okada Yasushi, and Takao Daisuke. Robust classification of cell cycle phase and biological feature extraction by image-based deep learning. Molecular biology of the cell, 31(13):1346–1354, 2020. PubMed PMC

Nasa Prashant, Jain Ravi, and Juneja Deven. Delphi methodology in healthcare research: how to decide its appropriateness. World Journal of Methodology, 11(4):116, 2021. PubMed PMC

Oakden-Rayner Luke, Dunnmon Jared, Carneiro Gustavo, and Ré Christopher. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. Proc ACM Conf Health Inference Learn (2020), 2020: 151–159, April 2020. PubMed PMC

Obermeyer Ziad, Powers Brian, Vogeli Christine, and Mullainathan Sendhil. Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, October 2019. PubMed

Park Seong Ho, Han Kyunghwa, Jang Hye Young, Park Ji Eun, Lee June-Goo, Kim Dong Wook, and Choi Jaesoon. Methods for Clinical Evaluation of Artificial Intelligence Algorithms for Medical Diagnosis. Radiology, 306(1):20–31, January 2023. ISSN 0033–8419. doi: 10.1148/radiol.220182. URL https://pubs.rsna.org/doi/10.1148/radiol.220182. Publisher: Radiological Society of North America. PubMed DOI

Patterson David, Gonzalez Joseph, Le Quoc, Liang Chen, Munguia Lluis-Miquel, Rothchild Daniel, So David, Texier Maud, and Dean Jeff. Carbon emissions and large neural network training. arXiv, April 2021.

Perez-Lebel Alexandre, Le Morvan Marine, and Varoquaux Gaël. Beyond calibration: estimating the grouping loss of modern neural networks. International Conference on Learning Representations, 2023.

Rand William M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336):846–850, 1971.

Reinke Annika, Eisenmann Matthias, Onogur Sinan, Stankovic Marko, Scholz Patrick, Full Peter M, Bogunovic Hrvoje, Landman Bennett A, Maier Oskar, Menze Bjoern, et al. How to exploit weaknesses in biomedical challenge design and organization. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 388–395. Springer, 2018.

Reinke Annika, Eisenmann Matthias, Tizabi Minu D, Sudre Carole H, Rädsch Tim, Antonelli Michela, Arbel Tal, Bakas Spyridon, Cardoso M Jorge, Cheplygina Veronika, et al. Common limitations of image processing metrics: A picture story. arXiv preprint arXiv:2104.05642, 2021.

Reinke Annika, Tizabi Minu D., Baumgartner Michael, Eisenmann Matthias, Heckmann-Nötzel Doreen, Kavur Emre, Rädsch Tim, Sudre Carole, et al. Understanding metric-related pitfalls in image analysis validation. arXiv preprint arXiv:2302.01790; sister publication jointly submitted with this work, 2023. PubMed PMC

Riley Richard D, Ensor Joie, Snell Kym IE, Debray Thomas PA, Altman Doug G, Moons Karel GM, and Collins Gary S. External validation of clinical prediction models using big datasets from e-health records or ipd meta-analysis: opportunities and challenges. bmj, 353, 2016. PubMed PMC

Roß Tobias, Bruno Pierangela, Reinke Annika, Wiesenfarth Manuel, Koeppel Lisa, Full Peter M, Pekdemir Bünyamin, Godau Patrick, Trofimova Darya, Isensee Fabian, et al. How can we learn (more) from challenges? a statistical approach to driving future algorithm development. arXiv preprint arXiv:2106.09302, 2021.

Sage Daniel, Kirshner Hagai, Pengo Thomas, Stuurman Nico, Min Junhong, Manley Suliana, and Unser Michael. Quantitative evaluation of software packages for single-molecule localization microscopy. Nature methods, 12(8): 717–724, 2015. PubMed

Schulam Peter and Saria Suchi. Can you trust this prediction? auditing pointwise reliability after learning. In Chaudhuri Kamalika and Sugiyama Masashi, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 1022–1031. PMLR, 2019.

Schulz Kenneth F, Altman Douglas G, Moher David, and CONSORT Group*. Consort 2010 statement: updated guidelines for reporting parallel group randomized trials. Annals of internal medicine, 152(11):726–732, 2010. PubMed

Shah Nigam H, Milstein Arnold, and Bagley Steven C. Making machine learning models clinically useful. Jama, 322 (14):1351–1352, 2019. PubMed

Simpson Amber L, Antonelli Michela, Bakas Spyridon, Bilello Michel, Farahani Keyvan, Van Ginneken Bram, Kopp-Schneider Annette, Landman Bennett A, Litjens Geert, Menze Bjoern, et al. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063, 2019.

Sounderajah Viknesh, Ashrafian Hutan, Aggarwal Ravi, De Fauw Jeffrey, Denniston Alastair K, Greaves Felix, Karthikesalingam Alan, King Dominic, Liu Xiaoxuan, Markar Sheraz R, McInnes Matthew D F, Panch Trishan, Pearson-Stuttard Jonathan, Ting Daniel S W, Golub Robert M, Moher David, Bossuyt Patrick M, and Darzi Ara. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI steering group. Nat. Med, 26(6):807–808, June 2020. PubMed

Steyerberg Ewout W, Vickers Andrew J, Cook Nancy R, Gerds Thomas, Gonen Mithat, Obuchowski Nancy, Pencina Michael J, and Kattan Michael W. Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology (Cambridge, Mass.), 21(1):128, 2010. PubMed PMC

Strubell Emma, Ganesh Ananya, and McCallum Andrew. Energy and policy considerations for deep learning in NLP. arXiv, June 2019.

Summers Cecilia and Dinneen Michael J. Nondeterminism and instability in neural network optimization. In International Conference on Machine Learning, pages 9913–9922. PMLR, 2021.

Taha Abdel Aziz and Hanbury Allan. Metrics for evaluating 3d medical image segmentation: analysis, selection, and tool. BMC medical imaging, 15(1):1–28, 2015. PubMed PMC

Targosz Anna, Przystałka Piotr, Wiaderkiewicz Ryszard, and Mrugacz Grzegorz. Semantic segmentation of human oocyte images using deep neural networks. BioMedical Engineering OnLine, 20(1):40, 2021. PubMed PMC

The Institute for Ethical Ai and Machine Learning. The institute for ethical AI & machine learning. https://ethical.institute/principles.html, 2018. Accessed: 2022-5-21.

Tirian Laszlo and Dickson Barry J. The vt gal4, lexa, and split-gal4 driver line collections for targeted expression in the drosophila nervous system. BioRxiv, page 198648, 2017.

Tran Thuy N, Adler Tim, Yamlahi Amine, Christodoulou Evangelia, Godau Patrick, Reinke Annika, Tizabi Minu D, Sauer Peter, Persicke Tillmann, Albert Jörg G., and Maier-Hein Lena. Sources of performance variability in deep learning-based polyp detection. arXiv preprint arXiv:2211.09708, 2022. PubMed PMC

Ulman Vladimír, Maška Martin, Magnusson Klas EG, Ronneberger Olaf, Haubold Carsten, Harder Nathalie, Matula Pavel, Matula Petr, Svoboda David, Radojevic Miroslav, et al. An objective comparison of cell-tracking algorithms. Nature methods, 14(12):1141–1152, 2017. PubMed PMC

Usatine Richard and Manci Rachel. Dermoscopedia, 2021. https://dermoscopedia.org/File:DF_chinese_dms.JPG.

Vaassen Femke, Hazelaar Colien, Vaniqui Ana, Gooding Mark, van der Heyden Brent, Canters Richard, and van Elmpt Wouter. Evaluation of measures for assessing time-saving of automatic organ-at-risk segmentation in radiotherapy. Physics and Imaging in Radiation Oncology, 13:1–6, 2020. PubMed PMC

Van Hoorde Kirsten, Van Huffel Sabine, Timmerman Dirk, Bourne Tom, and Van Calster Ben. A spline-based tool to assess and visualize the calibration of multiclass risk predictions. Journal of biomedical informatics, 54:283–293, 2015. PubMed

Vickers Andrew J, Van Calster Ben, and Steyerberg Ewout W. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. bmj, 352, 2016. PubMed PMC

Wiesenfarth Manuel, Reinke Annika, Landman Bennett A, Eisenmann Matthias, Saiz Laura Aguilera, Cardoso M Jorge, Maier-Hein Lena, and Kopp-Schneider Annette. Methods and open-source toolkit for analyzing and visualizing challenge results. Scientific reports, 11(1):1–15, 2021. PubMed PMC

Anthony Lasse F Wolff, Kanding Benjamin, and Selvan Raghavendra. Carbontracker: Tracking and predicting the carbon footprint of training deep learning models. arXiv, July 2020.

Zhang Ying, Xie Yubin, Liu Wenzhong, Deng Wankun, Peng Di, Wang Chenwei, Xu Haodong, Ruan Chen, Deng Yongjie, Guo Yaping, et al. Deepphagy: a deep learning framework for quantitatively measuring autophagy activity in saccharomyces cerevisiae. Autophagy, 16(4):626–640, 2020. PubMed PMC

Nejnovějších 20 citací...

Zobrazit více v
Medvik | PubMed

Topology-preserving contourwise shape fusion

. 2025 Mar 28 ; 15 (1) : 10713. [epub] 20250328

Optimized molecule detection in localization microscopy with selected false positive probability

. 2025 Jan 11 ; 16 (1) : 601. [epub] 20250111

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...