Deep neural networks have achieved outstanding performance over various tasks, but they have a critical issue: over-confident predictions even for completely unknown samples. Many studies have been proposed to successfully filter out these unknown samples, but they only considered narrow and specific tasks, referred to as misclassification detection, open-set recognition, or out-of-distribution detection. In this work, we argue that these tasks should be treated as fundamentally an identical problem because an ideal model should possess detection capability for all those tasks. Therefore, we introduce the unknown detection task, an integration of previous individual tasks, for a rigorous examination of the detection capability of deep neural networks on a wide spectrum of unknown samples. To this end, unified benchmark datasets on different scales were constructed and the unknown detection capabilities of existing popular methods were subject to comparison. We found that Deep Ensemble consistently outperforms the other approaches in detecting unknowns; however, all methods are only successful for a specific type of unknown. The reproducible code and benchmark datasets are available at https://github.com/daintlab/unknown-detection-benchmarks .
翻译:深心神经网络在各种任务中取得了杰出的成绩,但它们有一个关键问题:即使对于完全未知的样本,也存在过度自信的预测。许多研究建议成功地过滤这些未知的样本,但它们只考虑狭隘和具体的任务,即分类错误的检测、开放的识别或分配外的检测。在这项工作中,我们争辩说,这些任务应当被视为一个基本相同的问题,因为理想模型应该拥有所有这些任务的探测能力。因此,我们引入了未知的探测任务,即将先前的单个任务结合起来,以严格检查各种未知样本的深心神经网络的检测能力。为此,在不同尺度上建立了统一的基准数据集,并比较了现有流行方法的未知的检测能力。我们发现,深心在探测未知方面始终优于其他方法;然而,所有方法都只对特定类型的未知而言是成功的。可在https://github.com/daintlab/uncent-dection-chnocks.