Deep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against $\ell_{\infty}$-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data. Code is available at https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT.
翻译:深度神经网络可以通过对输入进行对抗扰动(即人类无法察觉的人工噪声)来轻易地欺骗并使其产生错误预测。迄今为止,对抗性训练是最成功的抵御这种对抗性攻击的防御方法。本研究旨在改进对抗性训练,以提高其对抗性鲁棒性。我们首先从实例级的角度分析对抗性漏洞在对抗性训练过程中的演变。我们发现,在训练过程中,通过牺牲相当比例的训练样本使其更容易受到对抗性攻击,从而实现对抗性损失的整体减少,这导致数据中存在对抗性漏洞的分布不均。这种 "不均匀漏洞 "在几种流行的强化训练方法中普遍存在,并且与对抗性训练中的过度拟合相关。受这个观察的启发,我们提出了一种新的对抗性训练方法:实例自适应平滑增强的对抗性训练(ISEAT)。它以一种自适应的、实例特定的方式同时平滑输入和权重损失的区域,以加强那些高度容易受到对抗攻击的样本的强大效果。广泛的实验证明了我们方法比现有的防御方法更为优越。需要注意的是,我们的方法与最新的数据增强和半监督学习技术相结合时,在 CIFAR10 上针对 $\ell_{\infty}$- 范数受限攻击的鲁棒性为 59.32%(Wide ResNet34-10,无附加数据),以及 61.55%(Wide ResNet28-10,有附加数据) 达到了最先进的水平。代码可在 https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT 上获得。