Metrics reloaded: recommendations for image analysis validation

Maier-Hein L, Reinke A, Godau P, Tizabi MD, Buettner F, Christodoulou E, Glocker B, Isensee F, Kleesiek J, Kozubek M, Reyes M, Riegler MA, Wiesenfarth M, Kavur AE, Sudre CH, Baumgartner M, Eisenmann M, Heckmann-Nötzel D, Rädsch T, Acion L, Antonelli M, Arbel T, Bakas S, Benis A, Blaschko MB, Cardoso MJ, Cheplygina V, Cimini BA, Collins GS, Farahani K, Ferrer L, Galdran A, van Ginneken B, Haase R, Hashimoto DA, Hoffman MM, Huisman M, Jannin P, Kahn CE, Kainmueller D, Kainz B, Karargyris A, Karthikesalingam A, Kofler F, Kopp-Schneider A, Kreshuk A, Kurc T, Landman BA, Litjens G, Madani A, Maier-Hein K, Martel AL, Mattson P, Meijering E, Menze B, Moons KG, Müller H, Nichyporuk B, Nickel F, Petersen J, Rajpoot N, Rieke N, Saez-Rodriguez J, Sánchez CI, Shetty S, van Smeden M, Summers RM, Taha AA, Tiulpin A, Tsaftaris SA, Van Calster B, Varoquaux G, Jäger PF (2024)


Publication Type: Journal article

Publication year: 2024

Journal

Book Volume: 21

Pages Range: 195-212

Journal Issue: 2

DOI: 10.1038/s41592-023-02151-z

Abstract

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint—a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

Authors with CRIS profile

Involved external institutions

Imperial College London / The Imperial College of Science, Technology and Medicine GB United Kingdom (GB) Deutsches Krebsforschungszentrum (DKFZ) DE Germany (DE) Universitätsklinikum Essen DE Germany (DE) Masaryk University CZ Czech Republic (CZ) Universität Bern CH Switzerland (CH) Simula Metropolitan Center for Digital Engineering (SimulaMet) NO Norway (NO) University College London (UCL) GB United Kingdom (GB) National Cancer Institute (NCI) US United States (USA) (US) University of Toronto CA Canada (CA) University of Zurich / Universität Zürich (UZH) CH Switzerland (CH) University Medical Centre Utrecht (UMC Utrecht) NL Netherlands (NL) Haute école spécialisée de Suisse occidentale (HES-SO) / Fachhochschule Westschweiz CH Switzerland (CH) Quebec AI Institute / Quebec Artificial Intelligence Institute / Montreal Institute for Learning Algorithms (MILA) CA Canada (CA) Universitätsklinikum Hamburg-Eppendorf (UKE) DE Germany (DE) University of Warwick GB United Kingdom (GB) Nvidia Corporation US United States (USA) (US) Universidad de Buenos Aires (UBA) / University of Buenos Aires AR Argentina (AR) King’s College London GB United Kingdom (GB) McGill University CA Canada (CA) Indiana University – Purdue University Indianapolis US United States (USA) (US) Holon Institute of Technology IL Israel (IL) Katholieke Universiteit Leuven (KUL) / Catholic University of Leuven BE Belgium (BE) IT University of Copenhagen DE Germany (DE) Eli and Edythe L. Broad Institute of MIT and Harvard US United States (USA) (US) University of Oxford GB United Kingdom (GB) Universitat Pompeu Fabra (UPF) ES Spain (ES) Fraunhofer-Institut für Bildgestützte Medizin (MEVIS) DE Germany (DE) Technische Universität Dresden DE Germany (DE) Perelman School of Medicine University of Pennsylvania US United States (USA) (US) Princess Margaret Cancer Centre / Princess Margaret Hospital CA Canada (CA) Radboud University Nijmegen Medical Centre / Radboudumc of voluit Radboud Universitair Medisch Centrum (UMC) NL Netherlands (NL) Université de Rennes 1 / University of Rennes 1 FR France (FR) Penn Medicine US United States (USA) (US) Max-Delbrück-Centrum für Molekulare Medizin / Max Delbrück Center for Molecular Medicine (MDC) Berlin-Buch DE Germany (DE) Institut de Chirurgie Guidée par l’Image de Strasbourg (IHU) / Institute of Image-Guided Surgery FR France (FR) Google Ireland Limited IE Ireland (IE) Helmholtz-Gemeinschaft / Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. DE Germany (DE) European Molecular Biology Laboratory (EMBL) DE Germany (DE) State University of New York at Albany (UNY Albany / UAlbany) US United States (USA) (US) Vanderbilt University US United States (USA) (US) St. Luke's University Health Network (SLUHN) US United States (USA) (US) University of New South Wales (UNSW) AU Australia (AU) Oulun Yliopisto / University of Oulo FI Finland (FI) University of Edinburgh GB United Kingdom (GB) Inria Saclay - Île-de-France Research Centre FR France (FR) Ruprecht-Karls-Universität Heidelberg DE Germany (DE) University of Amsterdam NL Netherlands (NL) National Institutes of Health Clinical Center US United States (USA) (US) Technische Universität Wien / Vienna University of Technology AT Austria (AT)

How to cite

APA:

Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M.D., Buettner, F., Christodoulou, E.,... Jäger, P.F. (2024). Metrics reloaded: recommendations for image analysis validation. Nature Methods, 21(2), 195-212. https://doi.org/10.1038/s41592-023-02151-z

MLA:

Maier-Hein, Lena, et al. "Metrics reloaded: recommendations for image analysis validation." Nature Methods 21.2 (2024): 195-212.

BibTeX: Download