The vulnerability of deep neural networks (DNNs) to adversarial examples has attracted more attention. Many algorithms have been proposed to craft powerful adversarial examples. However, most of these algorithms modified the global or local region of pixels without taking network explanations into account. Hence, the perturbations are redundant, which are easily detected by human eyes. In this paper, we propose a novel method to generate local region perturbations. The main idea is to find a contributing feature region (CFR) of an image by simulating the human attention mechanism and then add perturbations to CFR. Furthermore, a soft mask matrix is designed on the basis of an activation map to finely represent the contributions of each pixel in CFR. With this soft mask, we develop a new loss function with inverse temperature to search for optimal perturbations in CFR. Due to the network explanations, the perturbations added to CFR are more effective than those added to other regions. Extensive experiments conducted on CIFAR-10 and ILSVRC2012 demonstrate the effectiveness of the proposed method, including attack success rate, imperceptibility, and transferability.
翻译:深神经网络(DNNS)对对抗性实例的脆弱性引起了更多的注意,许多算法都建议设计强大的对抗性实例,但是,大多数这些算法在不考虑网络解释的情况下改变了全球或当地象素区域,因此,扰动是多余的,很容易被人类眼睛发现。在本文中,我们提出了一种产生局部扰动的新方法。主要想法是通过模拟人类注意力机制,然后给CFR增添扰动,找到一个图像的成因特征区域。此外,在激活地图的基础上设计了一个软面罩矩阵,以精确地代表CFR中每个象素的贡献。我们用这种软面罩开发了新的损失功能,温度为寻找CFR的最佳扰动进行反向的搜索。由于网络解释,CFR增加的扰动比其他地区增加的更有效。在CIFAR-10和ILSVRC-2012上进行的广泛实验,展示了拟议方法的有效性,包括攻击成功率、不可视力和可转移性。