Deep neural network image classifiers are reported to be susceptible to adversarial evasion attacks, which use carefully crafted images created to mislead a classifier. Recently, various kinds of adversarial attack methods have been proposed, most of which focus on adding small perturbations to all of the pixels of a real image. We find that a considerable amount of the perturbations on an image generated by some widely-used attacks may contribute little in attacking a classifier. However, they usually result in a more easily detectable adversarial image by both humans and adversarial attack detection algorithms. Therefore, it is important to impose the perturbations on the most vulnerable pixels of an image that can change the predictions of classifiers more readily. With the pixel vulnerability, given an existing attack, we can make its adversarial images more realistic and less detectable with fewer perturbations but keep its attack performance the same. Moreover, the discovered vulnerability assists to get a better understanding of the weakness of deep classifiers. Derived from the information-theoretic perspective, we propose a probabilistic approach for automatically finding the pixel vulnerability of an image, which is compatible with and improves over many existing adversarial attacks.
翻译:据报告,深神经网络图像分类器很容易受到对抗性规避攻击,这些攻击使用精心制作的图像来误导分类器。最近,提出了各种对抗性攻击方法,其中多数侧重于在真实图像的所有像素中增加小扰动。我们发现,对一些广泛使用的攻击所生成的图像产生的大量扰动,对攻击分类器没有多大帮助。然而,它们通常导致人类和对抗性攻击探测算法更容易探测到的对抗性图像。因此,重要的是,对能够更方便地改变分类器预测的最为脆弱的图像像素进行扰动。由于像素脆弱性,鉴于现有的攻击,我们可以使其对抗性图像更现实,更难被察觉,而扰动性更小,但攻击性表现也保持不变。此外,发现的脆弱性有助于更好地了解深度分类器的弱点。从信息理论角度出发,我们提议对可自动发现比对立性攻击的弱点,这与现有图像的可兼容性和可比较性可以自动发现许多对立性攻击。