When applying machine learning in safety-critical systems, a reliable assessment of the uncertainy of a classifier is required. However, deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data and even if trained to be non-confident on OOD data one can still adversarially manipulate OOD data so that the classifer again assigns high confidence to the manipulated samples. In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier. In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data. Moreover, due to the particular construction our classifier provably avoids the asymptotic overconfidence problem of standard neural networks.
翻译:在安全临界系统中应用机器学习时,需要可靠地评估一个分类器的不确定性。然而,深神经网络已知能对分配外数据产生高度自信的预测,即使经过培训对 OOD 数据不自信,人们仍然可以对 OOD 数据进行对抗性操作,以使分类器再次对被操纵的样本具有高度信心。在本文中,我们提出了一个新颖的方法,从最初的原则出发,我们将一个可核证的 OOOD 探测器与一个标准分类器合并到OOOD 认知的分类器中。这样,我们就能在两个世界中取得最佳的结果:可以确证的对抗性强 OOOD 检测,即使对于接近分布内的 OOD 样本,在预测准确性方面没有损失,而且对于非操纵的 OODD 数据也接近于状态的检测性能。此外,由于特别的构造,我们的分类器可以避免标准神经网络的无谓过度自信问题。