Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in the real world. Existing approaches for detecting OOD examples work well when evaluated on benign in-distribution and OOD samples. However, in this paper, we show that existing detection mechanisms can be extremely brittle when evaluating on in-distribution and OOD inputs with minimal adversarial perturbations which don't change their semantics. Formally, we extensively study the problem of Robust Out-of-Distribution Detection on common OOD detection approaches, and show that state-of-the-art OOD detectors can be easily fooled by adding small perturbations to the in-distribution and OOD inputs. To counteract these threats, we propose an effective algorithm called ALOE, which performs robust training by exposing the model to both adversarially crafted inlier and outlier examples. Our method can be flexibly combined with, and render existing methods robust. On common benchmark datasets, we show that ALOE substantially improves the robustness of state-of-the-art OOD detection, with 58.4% AUROC improvement on CIFAR-10 and 46.59% improvement on CIFAR-100.
翻译:在实际世界中安全部署深层学习模式的关键是检测分配(OOOD)的投入。在对良性分布和OOOD样本进行评估时,现有的检测OOD实例的方法效果良好。然而,在本文件中,我们表明,在评估分配和OOOD投入时,现有的检测机制可能极其脆弱,因为对分配和OOOD投入的评价是最低限度的对抗性扰动,不会改变其语义。形式上,我们广泛研究在通用OOOD检测方法上进行机器人传播外探测的问题,并表明通过在分配和OOOD投入中增加小扰动,可以轻易地欺骗最新的OOOOD探测器。为了对付这些威胁,我们建议一种称为ALOE的有效算法,进行强有力的培训,将模型暴露在对立的内和外推的例子中。我们的方法可以灵活地与现有方法结合起来,并使现有方法更加稳健。在共同的基准数据集中,我们表明,ALOOD探测器的状态探测能力大为改进了,在CIAR-10和CIAR-10上改进了58.4%的AUROC-100。