Deep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against $\ell_{\infty}$-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data. Code is available at https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT.
翻译:深度神经网络很容易通过对抗性扰动(即人眼无法察觉的人工噪声)破坏输入而导致错误预测。迄今为止,对抗性训练是最成功的防御对抗性攻击的方法。本文旨在改进对抗性训练,以提高对抗性鲁棒性。首先,我们从实例级的角度分析了在对抗性训练过程中对抗性漏洞的演变。我们发现,通过牺牲相当一部分的训练样本使其更易受对抗性攻击,可以实现对抗性损失的总体降低,这导致数据中对抗性漏洞的分布不均。这种“不均匀漏洞”在几种常见的鲁棒训练方法中普遍存在,并且与对抗性训练的过拟合有关。受此观察启发,我们提出了一种新的对抗训练方法:实例自适应平滑增强的对抗性训练 (ISEAT)。它以一种自适应、实例特定的方式,同时平滑输入和权重损失景观,以增强那些更容易受到对抗性攻击的样本的鲁棒性。大量实验验证了我们方法的优越性。值得注意的是,我们的方法在结合最新的数据增强和半监督学习技术后,对于CIFAR10数据集上的$\ell_{\infty}$- 范数约束攻击,在没有额外数据的情况下,Wide ResNet34-10的鲁棒性达到59.32%,有额外数据时Wide ResNet28-10的鲁棒性达到61.55%。代码可在 https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT 上获取。