Adversarial training (AT) has proven to be one of the most effective ways to defend Deep Neural Networks (DNNs) against adversarial attacks. However, the phenomenon of robust overfitting, i.e., the robustness will drop sharply at a certain stage, always exists during AT. It is of great importance to decrease this robust generalization gap in order to obtain a robust model. In this paper, we present an in-depth study towards the robust overfitting from a new angle. We observe that consistency regularization, a popular technique in semi-supervised learning, has a similar goal as AT and can be used to alleviate robust overfitting. We empirically validate this observation, and find a majority of prior solutions have implicit connections to consistency regularization. Motivated by this, we introduce a new AT solution, which integrates the consistency regularization and Mean Teacher (MT) strategy into AT. Specifically, we introduce a teacher model, coming from the average weights of the student models over the training steps. Then we design a consistency loss function to make the prediction distribution of the student models over adversarial examples consistent with that of the teacher model over clean samples. Experiments show that our proposed method can effectively alleviate robust overfitting and improve the robustness of DNN models against common adversarial attacks.
翻译:事实证明,Aversarial Adversarial 培训(AT)是保护深神经网络免遭对抗性攻击的最有效方法之一,然而,强力过度现象,即强力强力在一定阶段会急剧下降,这在AT期间总是存在。重要的是要缩小这种强大的普遍化差距,以获得一个强有力的模式。在本文件中,我们提出了一个深入研究,从新的角度来大力调整。我们发现,一致性规范化是半监督学习中流行的一种技术,其目标与AT相似,可以用来缓解强力过度的情况。我们从经验上验证了这一观察,并发现了大多数先前的解决办法都与一致性的正规化有着内在的联系。我们为此,引入了新的AT解决方案,将一致性规范化和模范(MT)战略纳入到AT中。具体地说,我们引入了一种教师模式,从学生模式在培训阶段的平均重量中得出。然后我们设计了一个一致性损失功能,使学生模型的预测分布与对抗性模型的预测值与教师模型比强力性强力性模型一致。实验表明,我们提出的方法可以有效地改进比强力性模型。