Deep neural network image classifiers are reported to be susceptible to adversarial evasion attacks, which use carefully crafted images created to mislead a classifier. Many adversarial attacks belong to the category of dense attacks, which generate adversarial examples by perturbing all the pixels of a natural image. To generate sparse perturbations, sparse attacks have been recently developed, which are usually independent attacks derived by modifying a dense attack's algorithm with sparsity regularisations, resulting in reduced attack efficiency. In this paper, we aim to tackle this task from a different perspective. We select the most effective perturbations from the ones generated from a dense attack, based on the fact we find that a considerable amount of the perturbations on an image generated by dense attacks may contribute little to attacking a classifier. Accordingly, we propose a probabilistic post-hoc framework that refines given dense attacks by significantly reducing the number of perturbed pixels but keeping their attack power, trained with mutual information maximisation. Given an arbitrary dense attack, the proposed model enjoys appealing compatibility for making its adversarial images more realistic and less detectable with fewer perturbations. Moreover, our framework performs adversarial attacks much faster than existing sparse attacks.
翻译:据报告,深神经网络图像分类器很容易受到对抗性规避攻击,这种攻击使用精心设计的图像来误导分类器。许多对抗性攻击属于密集攻击类别,通过扰动自然图像的所有像素来产生对抗性的例子。为了产生稀疏的扰动,最近发展了稀疏的攻击,这些攻击通常是独立攻击,通过改变密集攻击的算法,使用宽度规律来改变密集攻击的算法,从而降低攻击效率。在本文中,我们的目标是从不同的角度处理这项任务。我们从密集攻击中产生的最有效的扰动,因为我们发现,对密集攻击产生的图像的大量扰动可能无助于攻击一个分类器。因此,我们提出一个概率性后热度框架,通过大幅度减少扰动性像素的数量来改进密集攻击,同时保持其攻击能力,同时进行相互信息最大化的培训。根据任意密集的攻击,我们提出的模型具有吸引力,可以使其对抗性图像更符合现实性,而且不易探测,比现有对抗性攻击要快得多。此外,我们提出的框架还以较慢得多的对抗性攻击进行。