Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, a large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers towards choosing metrics in a problem-aware manner. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection from the domain interest to the properties of the target structure(s), data set and algorithm output. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. Users are guided through the process of selecting and applying appropriate validation metrics while being made aware of potential pitfalls. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a common point of access to explore weaknesses and strengths of the most common validation metrics. An instantiation of the framework for various biological and medical image analysis use cases demonstrates its broad applicability across domains.
翻译:越来越多的证据表明,机器学习(ML)算法验证方面的缺陷是一个低估了的全球问题。特别是在生物医学自动图像分析中,所选的业绩计量往往没有反映领域利益,因此无法充分衡量科学进步,妨碍将ML技术转化为实践。为克服这一点,一个大型国际专家财团创建了Metris Reloaded,这是一个综合框架,指导研究人员以有问题意识的方式选择衡量标准。在Metris Reload Reloaded方法在各种应用领域趋同之后,Metrics Reloaded促进验证方法的趋同。框架是在多阶段的Delphi进程中开发的,以问题指纹的新概念为基础,即对特定问题的结构性表述,从目标结构、数据集和算法产出的属性的属性的域内选择与指标选择相关的所有方面。Metricrets Reload 目标图像分析问题可以解释为图像、目标或像素级的分类任务,即图像等级的分类、物体检测、语义分化和实例分割任务。用户通过选择和适用适当的验证标准的新定义过程,同时也进行在线检索。