Fooling people with highly realistic fake images generated with Deepfake or GANs brings a great social disturbance to our society. Many methods have been proposed to detect fake images, but they are vulnerable to adversarial perturbations -- intentionally designed noises that can lead to the wrong prediction. Existing methods of attacking fake image detectors usually generate adversarial perturbations to perturb almost the entire image. This is redundant and increases the perceptibility of perturbations. In this paper, we propose a novel method to disrupt the fake image detection by determining key pixels to a fake image detector and attacking only the key pixels, which results in the $L_0$ and the $L_2$ norms of adversarial perturbations much less than those of existing works. Experiments on two public datasets with three fake image detectors indicate that our proposed method achieves state-of-the-art performance in both white-box and black-box attacks.
翻译:使用由 Deepfake 或 GANs 生成的非常现实的假图像来欺骗人给我们的社会带来巨大的社会扰动。 我们提出了许多方法来探测假图像,但他们很容易受到对抗性扰动 -- -- 故意设计的噪音可能导致错误预测。 现有的攻击假图像探测器的方法通常会产生对抗性扰动,以干扰整个图像。 这是多余的,增加了扰动的可感性。 在本文中,我们提出了一个新的方法,通过确定假图像探测器的关键像素来破坏假图像探测,并且只攻击关键像素,结果产生0.0美元和0.2美元的对抗性扰动规范,远远低于现有工作。用3个假图像探测器对两个公共数据集进行实验表明,我们拟议的方法在白箱和黑盒攻击中都取得了最先进的性能。