In this paper, we introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks while maintaining high clean accuracy by combining contrastive learning (CL) with adversarial training (AT). We propose to improve model robustness to adversarial attacks by learning feature representations that are consistent under both data augmentations and adversarial perturbations. We leverage contrastive learning to improve adversarial robustness by considering an adversarial example as another positive example, and aim to maximize the similarity between random augmentations of data samples and their adversarial example, while constantly updating the classification head in order to avoid a cognitive dissociation between the classification head and the embedding space. This dissociation is caused by the fact that CL updates the network up to the embedding space, while freezing the classification head which is used to generate new positive adversarial examples. We validate our method, Contrastive Learning with Adversarial Features(CLAF), on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.
翻译:在本文中,我们引入了新的神经网络培训框架,通过将对比学习与对抗性培训相结合,提高模型对对抗性攻击的对抗性强力,同时保持高清晰度,同时保持高清晰度;我们提议通过在数据增强和对抗性扰动下保持一致的特征表现,改进对对抗性攻击的示范性强力;我们利用对比性学习,通过将一个对抗性实例作为另一个正面例子来提高对抗性强力,目的是尽量扩大数据样本随机增加及其对抗性实例之间的相似性,同时不断更新分类头,以避免分类头与嵌入空间之间的认知脱钩;造成这种脱节的原因是,CL将网络更新到嵌入空间,同时冻结用于产生新的正面对抗性实例的分类头部。我们在CIFAR-10数据集上验证了我们的方法,即与反向特征的对比性学习(CLAF),该方法优于其他监督和自我控制的对抗性对抗性学习方法。