Traditional machine learning follows a close-set assumption that the training and test set share the same label space. While in many practical scenarios, it is inevitable that some test samples belong to unknown classes (open-set). To fix this issue, Open-Set Recognition (OSR), whose goal is to make correct predictions on both close-set samples and open-set samples, has attracted rising attention. In this direction, the vast majority of literature focuses on the pattern of open-set samples. However, how to evaluate model performance in this challenging task is still unsolved. In this paper, a systematic analysis reveals that most existing metrics are essentially inconsistent with the aforementioned goal of OSR: (1) For metrics extended from close-set classification, such as Open-set F-score, Youden's index, and Normalized Accuracy, a poor open-set prediction can escape from a low performance score with a superior close-set prediction. (2) Novelty detection AUC, which measures the ranking performance between close-set and open-set samples, ignores the close-set performance. To fix these issues, we propose a novel metric named OpenAUC. Compared with existing metrics, OpenAUC enjoys a concise pairwise formulation that evaluates open-set performance and close-set performance in a coupling manner. Further analysis shows that OpenAUC is free from the aforementioned inconsistency properties. Finally, an end-to-end learning method is proposed to minimize the OpenAUC risk, and the experimental results on popular benchmark datasets speak to its effectiveness. Project Page: https://github.com/wang22ti/OpenAUC.
翻译:传统机器学习遵循一个近距离假设,即培训和测试集共享相同的标签空间。 在许多实际假设中,一些测试样品不可避免地属于未知类别( 开放设置) 。 要解决这个问题, 目标在于对近距离抽样和开放设置样本作出正确预测的开放Set 识别(OSSR) 吸引了越来越多的注意力。 在这方面, 绝大多数文献侧重于开放设置样本的模式。 然而, 如何评价这一挑战性任务中的模型性能仍然尚未解决。 本文系统分析显示, 大部分现有测试样本基本上都与上述 OSR 目标不符:(1) 指标从近距离分类扩展, 如开放设置F- 核心、 Youden 索引和正常化的准确性能, 目标在于通过更高级的近距离近距离预测, 大部分文献侧重于公开设置样本。 但是, 衡量近距离和公开设置样本之间排序的AUC, 忽略了近距离设定的性能。 为了修正这些问题,我们提议采用名为 Opload AU/Oral AU 的快速性能分析。 比较一个名为 Open AU- AULAU 的快速性分析。, 和现有的不公开性评估, 以现有的业绩方式, 以更接近的方式进行一个公开的进度的进度分析。