Out-Of-Distribution (OOD) detection has received broad attention over the years, aiming to ensure the reliability and safety of deep neural networks (DNNs) in real-world scenarios by rejecting incorrect predictions. However, we notice a discrepancy between the conventional evaluation vs. the essential purpose of OOD detection. On the one hand, the conventional evaluation exclusively considers risks caused by label-space distribution shifts while ignoring the risks from input-space distribution shifts. On the other hand, the conventional evaluation reward detection methods for not rejecting the misclassified image in the validation dataset. However, the misclassified image can also cause risks and should be rejected. We appeal to rethink OOD detection from a human-centric perspective, that a proper detection method should reject the case that the deep model's prediction mismatches the human expectations and adopt the case that the deep model's prediction meets the human expectations. We propose a human-centric evaluation and conduct extensive experiments on 45 classifiers and 8 test datasets. We find that the simple baseline OOD detection method can achieve comparable and even better performance than the recently proposed methods, which means that the development in OOD detection in the past years may be overestimated. Additionally, our experiments demonstrate that model selection is non-trivial for OOD detection and should be considered as an integral of the proposed method, which differs from the claim in existing works that proposed methods are universal across different models.
翻译:多年来,人们广泛关注在外分配(OOOD)探测工作,目的是通过拒绝不正确的预测,确保现实世界情景中深神经网络(DNNs)的可靠性和安全性。然而,我们注意到常规评价与OOOD探测基本目的之间存在差异。一方面,常规评价专门考虑标签-空间分布变化造成的风险,而忽视输入-空间分布变化的风险。另一方面,常规评价奖励不拒绝验证数据集中错误分类图像的检测方法。然而,误分类图像也可能带来风险,而且应当被拒绝。我们呼吁从以人为中心的角度重新思考OOOD探测方法,即适当的检测方法应拒绝深模型的预测与人类期望不匹配的情况,并采用深模型所拟议的预测符合人类期望的情况。我们提议以人为中心的评估,对45个分类器和8个测试数据集进行广泛的实验。我们发现,简单的OOOD检测基准方法可以比最近提出的方法取得可比较甚至更好的性。我们呼吁从以人为中心的角度重新思考OOOOOD检测检测方法,这意味着OOD检测工作的整体性选择方法在过去几年中可能是一种不同的研究。