Out-of-distribution (OOD) detection has recently gained substantial attention due to the importance of identifying out-of-domain samples in reliability and safety. Although OOD detection methods have advanced by a great deal, they are still susceptible to adversarial examples, which is a violation of their purpose. To mitigate this issue, several defenses have recently been proposed. Nevertheless, these efforts remained ineffective, as their evaluations are based on either small perturbation sizes, or weak attacks. In this work, we re-examine these defenses against an end-to-end PGD attack on in/out data with larger perturbation sizes, e.g. up to commonly used $\epsilon=8/255$ for the CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than a random detection under the adversarial setting. Next, we aim to provide a robust OOD detection method. In an ideal defense, the training should expose the model to almost all possible adversarial perturbations, which can be achieved through adversarial training. That is, such training perturbations should based on both in- and out-of-distribution samples. Therefore, unlike OOD detection in the standard setting, access to OOD, as well as in-distribution, samples sounds necessary in the adversarial training setup. These tips lead us to adopt generative OOD detection methods, such as OpenGAN, as a baseline. We subsequently propose the Adversarially Trained Discriminator (ATD), which utilizes a pre-trained robust model to extract robust features, and a generator model to create OOD samples. Using ATD with CIFAR-10 and CIFAR-100 as the in-distribution data, we could significantly outperform all previous methods in the robust AUROC while maintaining high standard AUROC and classification accuracy. The code repository is available at https://github.com/rohban-lab/ATD .
翻译:最近,由于在可靠性和安全性方面确定外部样本的重要性,在分配之外检测工作最近得到了大量关注。虽然OOD检测方法取得了很大进展,但仍然容易出现对抗性例子,这是违反其目的的。为缓解这一问题,最近提出了若干防守建议。然而,这些努力仍然没有效果,因为它们的评估基于小扰动大小或攻击薄弱。在这项工作中,我们重新审查了这些防御手段,以防止对具有较大扰动尺寸的内/外数据进行终端至终端的PGD攻击,例如,对通常使用的美元=epsilon=8/255美元,用于CIFAR-10数据集。令人惊讶的是,几乎所有这些防御手段都比在对抗性框架下进行的随机探测还要差。我们的目标是提供强有力的OODD检测方法。在理想的防御中,培训应该使我们的模型几乎都接触到所有可能的对内向外的防扰动。在对内培训中,通过对ODA进行快速的检测,在OD的检测过程中,在对OD进行必要的测试中,在OD的升级中,在测试中,将一个必要的对OD的升级中,在测试中,将一个快速进行,在测试中,在测试中,在测试中,在测试中,在测试中,将一个必要的对ODDOD的升级中,将一个高级中,将一个必要的对调调调制中,将一个高级中,将一个高级。