Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to apply more regularization to data vulnerable to adversarial attacks than other existing regularization algorithms do. Theoretically, we show that our algorithm can be understood as an algorithm of minimizing the regularized empirical risk motivated from a newly derived upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on examples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance.
翻译:对抗训练因其易受到生成人类难以察觉的数据扰动而引起了广泛关注,这种扰动可以欺骗给定的深度神经网络。在本文中,我们提出了一种新的对抗训练算法,这个算法在理论和实践上都比现有算法优越。我们算法的一个新特点是在易受到对抗攻击的数据上应用更多的规则化,而其他现有的规则化算法没有做到这一点。从理论上讲,我们表明我们的算法可以被理解为是最小化经过规则化的经验风险,它来源于一个新的被上界限定好的鲁棒性风险。数字实验表明,我们提出的算法可以同时提高泛化能力(样例准确率)和鲁棒性(对抗攻击准确率),达到最先进的性能。