Despite the success of convolutional neural networks (CNNs) in many computer vision and image analysis tasks, they remain vulnerable against so-called adversarial attacks: Small, crafted perturbations in the input images can lead to false predictions. A possible defense is to detect adversarial examples. In this work, we show how analysis in the Fourier domain of input images and feature maps can be used to distinguish benign test samples from adversarial images. We propose two novel detection methods: Our first method employs the magnitude spectrum of the input images to detect an adversarial attack. This simple and robust classifier can successfully detect adversarial perturbations of three commonly used attack methods. The second method builds upon the first and additionally extracts the phase of Fourier coefficients of feature-maps at different layers of the network. With this extension, we are able to improve adversarial detection rates compared to state-of-the-art detectors on five different attack methods.
翻译:尽管在很多计算机视觉和图像分析任务中取得了进化神经网络(CNNs)的成功,但它们仍然容易受到所谓的对抗性攻击:输入图像中小的、精心制造的扰动可能导致虚假预测。一种可能的防御是检测对抗性例子。在这项工作中,我们展示了如何在Fourier范围内对输入图像和地貌图进行分析,以区分良性测试样本和对抗性图像。我们提出了两种新的探测方法:我们的第一种方法利用输入图像的广度来探测对抗性攻击。这个简单而有力的分类器能够成功探测三种常用攻击方法的对抗性扰动。第二种方法建立在网络不同层次的四倍以上特征图谱的阶段。有了这一扩展,我们可以提高对抗性探测率,而采用五种不同攻击方法的先进探测器。