Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.
翻译:深心神经网络(DNNs) 被发现易受到因投入中增加小放大扰动而引发的对抗性例子。 这种对抗性例子可以误导 DNNs, 以产生敌对结果。 提出了不同的攻击战略, 以产生对抗性例子, 但如何以高认知质量和更有效的方式产生这些例子需要更多的研究努力。 在本文中, 我们建议 AdvGAN 以基因化对抗性对立网络( GANs) 生成对抗性例子, 可以学习并大致了解原始事例的分布。 对于 AdvGAN 来说, 一旦发电机接受了培训, 它可以有效地产生对抗性干扰, 从而有可能加速对抗性攻击性训练, 从而加速对抗性攻击性训练作为防御。 我们在半白箱和黑箱攻击环境中都应用AdvGAN 。 在半白箱攻击后, 我们不需要使用原始的目标模型, 与传统的白箱攻击攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性