Deep neural networks are known to be overconfident when applied to out-of-distribution (OOD) inputs which clearly do not belong to any class. This is a problem in safety-critical applications since a reliable assessment of the uncertainty of a classifier is a key property, allowing the system to trigger human intervention or to transfer into a safe state. In this paper, we aim for certifiable worst case guarantees for OOD detection by enforcing not only low confidence at the OOD point but also in an $l_\infty$-ball around it. For this purpose, we use interval bound propagation (IBP) to upper bound the maximal confidence in the $l_\infty$-ball and minimize this upper bound during training time. We show that non-trivial bounds on the confidence for OOD data generalizing beyond the OOD dataset seen at training time are possible. Moreover, in contrast to certified adversarial robustness which typically comes with significant loss in prediction performance, certified guarantees for worst case OOD detection are possible without much loss in accuracy.
翻译:众所周知,深神经网络在应用到显然不属于任何类别的分配外(OOOD)输入时过于自信,这是安全关键应用中的一个问题,因为可靠地评估分类器的不确定性是一项关键属性,使系统能够触发人类干预或转移到安全状态。在本文中,我们的目标是通过不仅在OOOD点执行低置信度,而且在OOOD点周围的美元美元球中执行最坏的检测保证。为此目的,我们使用间隔约束传播(IBP)来将$l ⁇ inty$-ball的最高置信度上限,并在培训期间尽量减少这一上限约束。我们表明,在培训时间所看到的OOD数据集之外,对OOD数据的一般信任度有非三边界限是可能的。此外,与通常在预测性效果方面蒙受重大损失的经认证的对抗性强力相比,对最坏的OOD检测的认证保证是有可能的,没有太多的准确性损失。