The vulnerability of deep neural networks (DNNs) for adversarial examples have attracted more attention. Many algorithms are proposed to craft powerful adversarial examples. However, these algorithms modifying the global or local region of pixels without taking into account network explanations. Hence, the perturbations are redundancy and easily detected by human eyes. In this paper, we propose a novel method to generate local region perturbations. The main idea is to find the contributing feature regions (CFRs) of images based on network explanations for perturbations. Due to the network explanations, the perturbations added to the CFRs are more effective than other regions. In our method, a soft mask matrix is designed to represent the CFRs for finely characterizing the contributions of each pixel. Based on this soft mask, we develop a new objective function with inverse temperature to search for optimal perturbations in CFRs. Extensive experiments are conducted on CIFAR-10 and ILSVRC2012, which demonstrate the effectiveness, including attack success rate, imperceptibility,and transferability.
翻译:深度神经网络(DNNs)对于对抗性实例的脆弱性引起了更多的注意,许多算法都建议设计强大的对抗性实例,然而,这些算法在不考虑网络解释的情况下改变了全球或当地象素区域,因此,扰动是冗余的,很容易被人类眼睛探测到。在本文中,我们提出了一个产生局部区域扰动的新方法。主要目的是根据网络对扰动的解释找到图像的生成特征区域。由于网络解释,CFR增加的扰动比其他区域更有效。在我们的方法中,一个软面罩矩阵旨在代表CFRs对每个象素贡献的精细定性。基于这一软面罩,我们开发了一个新的目标功能,温度不高,以寻找CFRs的最佳扰动。在CIFAR-10和ILSVRC-2012上进行了广泛的实验,这些实验显示了有效性,包括攻击成功率、不可视性和可转移性。