Deep neural networks (DNNs) are threatened by adversarial examples. Adversarial detection, which distinguishes adversarial images from benign images, is fundamental for robust DNN-based services. Image transformation is one of the most effective approaches to detect adversarial examples. During the last few years, a variety of image transformations have been studied and discussed to design reliable adversarial detectors. In this paper, we systematically synthesize the recent progress on adversarial detection via image transformations with a novel classification method. Then, we conduct extensive experiments to test the detection performance of image transformations against state-of-the-art adversarial attacks. Furthermore, we reveal that each individual transformation is not capable of detecting adversarial examples in a robust way, and propose a DNN-based approach referred to as \emph{AdvJudge}, which combines scores of 9 image transformations. Without knowing which individual scores are misleading or not misleading, AdvJudge can make the right judgment, and achieve a significant improvement in detection rate. Finally, we utilize an explainable AI tool to show the contribution of each image transformation to adversarial detection. Experimental results show that the contribution of image transformations to adversarial detection is significantly different, the combination of them can significantly improve the generic detection ability against state-of-the-art adversarial attacks.
翻译:深神经网络(DNNs) 受到对抗性实例的威胁。 对抗性检测将对抗性图像与良性图像区别开来,是稳健的DNN服务的基础。 图像转换是发现对抗性实例的最有效方法之一。 在过去几年中,对各种图像转换进行了研究和讨论,以设计可靠的对抗性检测器。 在本文中,我们系统地综合了通过图像转换进行对抗性检测的最新进展,并采用了新的分类方法。 然后,我们进行了广泛的实验,以测试针对最先进的对抗性袭击的图像转换的检测性能。 此外,我们发现,每个个人转换都无法以强健的方式发现对抗性实例,并提出一种以DNNN为基础的方法,称为emph{AdvJugt}, 这种方法将9个图像转换的分数结合起来。 AdvJjudge(AdvJjudge) 能够以新的分类方法来作出正确的判断,并在检测率上实现显著的改进。 最后,我们使用一个可解释的AI工具来显示每个图像转换对敌对性检测的贡献。 实验性结果显示, 对抗性袭击的通用检测能力大大改进了对立性状态。