Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to use a data-adaptive regularization for robustifying a prediction model. We apply more regularization to data which are more vulnerable to adversarial attacks and vice versa. Even though the idea of data-adaptive regularization is not new, our data-adaptive regularization has a firm theoretical base of reducing an upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on clean samples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance.
翻译:旨在增强抵御对抗性攻击的强力的反向培训受到极大关注,因为很容易生成人类无法察觉的数据干扰,以欺骗某个深神经网络。在本文中,我们提议一种新的对抗性培训算法,这种算法在理论上动机良好,在经验上优于其他现有的算法。拟议算法的一个新特点是利用数据适应性规范化来巩固预测模型。我们对更易于遭受对抗性攻击的数据进行更多的规范化,反之亦然。尽管数据适应性规范化的理念并不新鲜,但我们的数据适应性规范化有一个坚实的理论基础,可以减少强力风险的上限。 数字实验表明,我们拟议的算法提高了一般化(对清洁样品的准确性)和稳健性(对对抗性攻击的准确性),同时实现了最先进的性能。