As the use of machine learning continues to expand, the importance of ensuring its safety cannot be overstated. A key concern in this regard is the ability to identify whether a given sample is from the training distribution, or is an "Out-Of-Distribution" (OOD) sample. In addition, adversaries can manipulate OOD samples in ways that lead a classifier to make a confident prediction. In this study, we present a novel approach for certifying the robustness of OOD detection within a $\ell_2$-norm around the input, regardless of network architecture and without the need for specific components or additional training. Further, we improve current techniques for detecting adversarial attacks on OOD samples, while providing high levels of certified and adversarial robustness on in-distribution samples. The average of all OOD detection metrics on CIFAR10/100 shows an increase of $\sim 13 \% / 5\%$ relative to previous approaches.
翻译:扩散去噪平滑用于认证和对抗鲁棒性的失分布检测
Translated Abstract:
在机器学习的运用不断扩大的同时,确保其安全性显得尤为重要。其中一个重要的问题是如何鉴别一个样本是训练集中的一员,还是来自“失分布”(ODD)。此外,对手可以以劫持失分布样本的方式来使分类器做出自信的预测。本文提出了一种新颖的方法,可在输入的 $\ell_2$ 范围内认证失分布检测的鲁棒性,而不受网络体系结构以及特定组件或额外训练的制约。此外,我们改进了当前技术以检测用于失分布样本的对抗攻击,同时在正分布样本上提供高水平的认证和对抗鲁棒性。在CIFAR10/100上所有失分布检测指标的平均值相比之前的方法增加了约 $\sim 13\% / 5\%$。