Adversarial attacks in the input (pixel) space typically incorporate noise margins such as $L_1$ or $L_{\infty}$-norm to produce imperceptibly perturbed data that confound deep learning networks. Such noise margins confine the magnitude of permissible noise. In this work, we propose injecting adversarial perturbations in the latent (feature) space using a generative adversarial network, removing the need for margin-based priors. Experiments on MNIST, CIFAR10, Fashion-MNIST, CIFAR100 and Stanford Dogs datasets support the effectiveness of the proposed method in generating adversarial attacks in the latent space while ensuring a high degree of visual realism with respect to pixel-based adversarial attack methods.
翻译:在输入(像素)空间中的敌对攻击通常会引入噪声边缘,例如 $L_1$ 或 $L_{\infty}$-范数,以产生与深度学习网络相矛盾的微小扰动数据。这些噪声边缘限制了可允许噪声的幅度。在这项工作中,我们提出使用生成对抗网络在显著性(特征)空间中注入对抗性扰动,从而消除了基于边缘的先验条件的需要。在 MNIST,CIFAR10,Fashion-MNIST,CIFAR100 和斯坦福狗数据集上的实验证明了该方法在生成对抗性攻击方面的有效性,同时与基于像素的对抗性攻击方法相比,保证了高度的视觉逼真性。