重新加载的度量:坑和图像分析验证建议 (Metrics reloaded: Pitfalls and recommendations for image analysis validation)

Lena Maier-Hein,Annika Reinke,Patrick Godau,Minu D. Tizabi,Florian Büttner,Evangelia Christodoulou,Ben Glocker,Fabian Isensee,Jens Kleesiek,Michal Kozubek,Mauricio Reyes,Michael A. Riegler,Manuel Wiesenfarth,Emre Kavur,Carole H. Sudre,Michael Baumgartner,Matthias Eisenmann,Doreen Heckmann-Nötzel,A. Tim Rädsch,Laura Acion,Michela Antonelli,Tal Arbel,Spyridon Bakas,Arriel Benis,Matthew Blaschko,M. Jorge Cardoso,Veronika Cheplygina,Beth A. Cimini,Gary S. Collins,Keyvan Farahani,Luciana Ferrer,Adrian Galdran,Bram van Ginneken,Robert Haase,Daniel A. Hashimoto,Michael M. Hoffman,Merel Huisman,Pierre Jannin,Charles E. Kahn,Dagmar Kainmueller,Bernhard Kainz,Alexandros Karargyris,Alan Karthikesalingam,Hannes Kenngott,Florian Kofler,Annette Kopp-Schneider,Anna Kreshuk,Tahsin Kurc,Bennett A. Landman,Geert Litjens,Amin Madani,Klaus Maier-Hein,Anne L. Martel,Peter Mattson,Erik Meijering,Bjoern Menze,Karel G. M. Moons,Henning Müller,Brennan Nichyporuk,Felix Nickel,Jens Petersen,Nasir Rajpoot,Nicola Rieke,Julio Saez-Rodriguez,Clara I. Sánchez,Shravya Shetty,Maarten van Smeden,Ronald M. Summers,Abdel A. Taha,Aleksei Tiulpin,Sotirios A. Tsaftaris,Ben Van Calster,Gaël Varoquaux,Paul F. Jäger

from arxiv, Shared first authors: Lena Maier-Hein, Annika Reinke

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases.

翻译：越来越多的证据表明,机器学习(ML)算法验证的缺陷是一个低估了的全球问题。特别是在自动生物医学图像分析中,所选的业绩计量往往没有反映领域利益,从而无法充分衡量科学进步,妨碍将ML技术转化为实践。为了克服这一点,我们庞大的国际专家财团创建了Metris Reloaded,这是一个指导研究人员在有问题时选择和运用适当验证度指标的综合框架。在跨应用领域整合ML方法之后,Metris Reload促进验证方法的趋同。框架是在多阶段的Delphi进程中开发的,基于问题指纹的新概念----即对特定问题的结构性描述,从领域利益到目标结构、数据集和算法输出的特性,从领域到目标结构、数据集的属性,用户选择和应用适当的验证度衡量标准,同时意识到潜在的缺陷。重新加载目标图像分析问题可以被解释为在图像、对象或平分级层面的分类任务,即多数图像级域域域分析的通用域域域、目标探测、具体检索框架的深度分析,以及我们所执行的直径路路路段框架,为我们所展示的在线分析提供的具体检索分析。