Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to apply more regularization to data vulnerable to adversarial attacks than other existing regularization algorithms do. Theoretically, we show that our algorithm can be understood as an algorithm of minimizing the regularized empirical risk motivated from a newly derived upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on examples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance.
翻译:对抗训练是增强神经网络面对对抗攻击的鲁棒性引起了广泛关注,因为可以生成人类无法感知的数据扰动来欺骗给定的神经网络。本文提出了一种新的对抗训练算法,该算法理论上得到了很好的解释,并且在实践中更加优越。该算法的一个新特点是比现有正则化算法对容易受到对抗攻击的数据应用更多的正则化。从理论上讲,我们证明了我们的算法可以被解释为最小化经验正则化风险的算法,这是基于新的鲁棒风险的上界推导的。数字实验说明,我们提出的算法同时提高了泛化(示例上的准确率)和鲁棒性(对抗攻击时的准确率),并取得了最先进的性能。