Deep Neural Networks (DNN) have been shown to be vulnerable to adversarial examples. Adversarial training (AT) is a popular and effective strategy to defend against adversarial attacks. Recent works (Benz et al., 2020; Xu et al., 2021; Tian et al., 2021) have shown that a robust model well-trained by AT exhibits a remarkable robustness disparity among classes, and propose various methods to obtain consistent robust accuracy across classes. Unfortunately, these methods sacrifice a good deal of the average robust accuracy. Accordingly, this paper proposes a novel framework of worst-class adversarial training and leverages no-regret dynamics to solve this problem. Our goal is to obtain a classifier with great performance on worst-class and sacrifice just a little average robust accuracy at the same time. We then rigorously analyze the theoretical properties of our proposed algorithm, and the generalization error bound in terms of the worst-class robust risk. Furthermore, we propose a measurement to evaluate the proposed method in terms of both the average and worst-class accuracies. Experiments on various datasets and networks show that our proposed method outperforms the state-of-the-art approaches.
翻译:深神经网络(DNN)被证明很容易受到对抗性攻击的例子的影响。反向培训(AT)是防范对抗性攻击的流行而有效的战略。最近的著作(Benz等人,2020年;Xu等人,2021年;Tian等人,2021年)显示,由AT进行良好训练的强大模型显示,由AT进行严格训练的模型显示,各年级学生之间差异巨大,并提出了获得一致稳健准确性的方法。不幸的是,这些方法牺牲了大部分平均稳健准确性。因此,本文提出了一个最差的对抗性培训新框架,并利用无报复性动态来解决这一问题。我们的目标是在最坏的等级上获得一个业绩优秀的分类师,同时只牺牲一个平均强健的精度。然后我们严格分析我们提议的算法的理论属性,以及以最差的稳健性风险为约束的普遍错误。此外,我们建议用一种衡量尺度来评价拟议方法的平均和最差的准确性。在各种数据集和网络上进行实验,显示我们提议的方法是超越了状态的方法。