Deep neural networks (DNNs) are threatened by adversarial examples. Adversarial detection, which distinguishes adversarial images from benign images, is fundamental for robust DNN-based services. Image transformation is one of the most effective approaches to detect adversarial examples. During the last few years, a variety of image transformations have been studied and discussed to design reliable adversarial detectors. In this paper, we systematically synthesize the recent progress on adversarial detection via image transformations with a novel classification method. Then, we conduct extensive experiments to test the detection performance of image transformations against state-of-the-art adversarial attacks. Furthermore, we reveal that each individual transformation is not capable of detecting adversarial examples in a robust way, and propose a DNN-based approach referred to as AdvJudge, which combines scores of 9 image transformations. Without knowing which individual scores are misleading or not misleading, AdvJudge can make the right judgment, and achieve a significant improvement in detection accuracy. We claim that AdvJudge is a more effective adversarial detector than those based on an individual image transformation.
翻译:深神经网络(DNNs)受到对抗性实例的威胁。 对抗性检测将对抗性图像与良性图像区分开来,是强健的DNN服务的基础。 图像转换是发现对抗性实例的最有效方法之一。 在过去几年中,对各种图像转换进行了研究和讨论,以设计可靠的对抗性检测器。 在本文中,我们系统地综合了最近通过图像转换进行对抗性检测的进展,并采用了新的分类方法。 然后,我们进行了广泛的实验,以测试针对最先进的对抗性袭击的图像转换的检测性能。 此外,我们发现,每个个人转换都无法以强健的方式发现对抗性实例,并提出一个称为Advjudge的基于DNNN的方法,该方法将9个图像转换的分数结合起来。 AdvJutc在不知道哪些个人分数误导或不误导的情况下,可以作出正确的判断,并在检测准确性方面实现显著的改进。 我们称,AdvJjudge是一个比个人图像转换法更有效的对抗性检测器。