Over the last few years, convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. However, CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output while being extremely similar to those for which a correct output is predicted. Regular adversarial examples are not robust to input image transformations, which can then be used to detect whether an adversarial example is presented to the network. Nevertheless, it is still possible to generate adversarial examples that are robust to such transformations. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology, called \textit{defense perturbation}, to detect robust adversarial examples with the same input transformations the adversarial examples are robust to. Such a \textit{defense perturbation} is shown to be an effective counter-measure to robust adversarial examples. Furthermore, multi-network adversarial examples are introduced. This kind of adversarial examples can be used to simultaneously fool multiple networks, which is critical in systems that use network redundancy, such as those based on architectures with majority voting over multiple CNNs. An extensive set of experiments based on state-of-the-art CNNs trained on the Imagenet dataset is finally reported.
翻译:过去几年来, convolutional 神经网络(CNNs)在视觉识别任务中被证明达到了超人性的表现。然而,CNN可以很容易地被对抗性例子蒙骗,例如恶意制作的图像迫使网络预测不正确的输出,而这种图像则与预测正确输出的图像极为相似。经常的对抗性实例对输入图像转换并不有力,然后可以用来检测是否向网络展示了对抗性实例。然而,仍然有可能生成对此类转变具有强大的对抗性实例。本文广泛探索通过图像转换发现对抗性实例,并提出一种新颖的方法,称为\ textit{ Defeferurburbation},迫使网络预测不正确的输出,同时检测强大的对抗性实例,而与对正确输出的参数非常相似。这样的对抗性实例被证明对输入图像转换是一种有效的反制措施。此外,多网络对抗性实例被引入了。这种对抗性实例可以同时用来愚弄多个网络网络,而这种网络在多数网络的实验中是关键,这些系统使用经过培训的图像冗余性数据,这些是基于已建立的多数网络结构。