Adversarial attack is aimed at fooling the target classifier with imperceptible perturbation. Adversarial examples, which are carefully crafted with a malicious purpose, can lead to erroneous predictions, resulting in catastrophic accidents. To mitigate the effects of adversarial attacks, we propose a novel purification model called CAP-GAN. CAP-GAN takes account of the idea of pixel-level and feature-level consistency to achieve reasonable purification under cycle-consistent learning. Specifically, we utilize the guided attention module and knowledge distillation to convey meaningful information to the purification model. Once a model is fully trained, inputs would be projected into the purification model and transformed into clean-like images. We vary the capacity of the adversary to argue the robustness against various types of attack strategies. On the CIFAR-10 dataset, CAP-GAN outperforms other pre-processing based defenses under both black-box and white-box settings.
翻译:反向攻击旨在以无法察觉的干扰来欺骗目标分类器。 恶意目的精心设计的反向例子可能导致错误预测,导致灾难性事故。为了减轻对抗性攻击的影响,我们提议了一个称为CAP-GAN的新型净化模型。 CAP-GAN考虑到像素水平和特征水平的一致性的想法,以便在循环一致的学习中实现合理的净化。具体地说,我们利用引导关注模块和知识蒸馏将有意义的信息传递给净化模型。一旦模型经过充分培训,将投入投射到净化模型中,并转换成干净的图像。我们改变了对手对各种攻击战略的有力性进行争论的能力。在CIFAR-10数据集中,CAP-GAN比黑盒和白盒设置下的其他处理前防御系统要强。