Adding perturbations via utilizing auxiliary gradient information or discarding existing details of the benign images are two common approaches for generating adversarial examples. Though visual imperceptibility is the desired property of adversarial examples, conventional adversarial attacks still generate traceable adversarial perturbations. In this paper, we introduce a novel Adversarial Attack via Invertible Neural Networks (AdvINN) method to produce robust and imperceptible adversarial examples. Specifically, AdvINN fully takes advantage of the information preservation property of Invertible Neural Networks and thereby generates adversarial examples by simultaneously adding class-specific semantic information of the target class and dropping discriminant information of the original class. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that the proposed AdvINN method can produce less imperceptible adversarial images than the state-of-the-art methods and AdvINN yields more robust adversarial examples with high confidence compared to other adversarial attacks.
翻译:通过使用辅助梯度信息或抛弃良性图像的现有细节添加扰动,是产生对抗性实例的两种常见办法。虽然视觉不易辨识是敌对性实例的预期特性,但传统的对抗性攻击仍然产生可追踪的对抗性扰动。在本文中,我们采用了一种新型的通过不可逆的神经网络(AdvINN)进行反向攻击的方法,以产生有力和不可察觉的对抗性例子。具体地说,AdvINN充分利用不可逆神经网络的信息保存特性,从而产生对抗性例子,同时增加特定类别的目标类别的具体语义信息,并放弃原始类别中的反向信息。关于CIFAR-10、CIFAR-100和图像Net-1K的广泛实验表明,拟议的AdvinN方法可以产生比国家技术方法更不易辨识的对抗性对抗性图像,AdvINN与其他对抗性攻击相比,具有高度信心,产生更强有力的对抗性对抗性例子。