It is well known that deep learning models have a propensity for fitting the entire training set even with random labels, which requires memorization of every training sample. In this paper, we investigate the memorization effect in adversarial training (AT) for promoting a deeper understanding of capacity, convergence, generalization, and especially robust overfitting of adversarially trained classifiers. We first demonstrate that deep networks have sufficient capacity to memorize adversarial examples of training data with completely random labels, but not all AT algorithms can converge under the extreme circumstance. Our study of AT with random labels motivates further analyses on the convergence and generalization of AT. We find that some AT methods suffer from a gradient instability issue, and the recently suggested complexity measures cannot explain robust generalization by considering models trained on random labels. Furthermore, we identify a significant drawback of memorization in AT that it could result in robust overfitting. We then propose a new mitigation algorithm motivated by detailed memorization analyses. Extensive experiments on various datasets validate the effectiveness of the proposed method.
翻译:众所周知,深层次的学习模式倾向于将整个培训组装齐,即使使用随机标签,也需要对每个培训样本进行记忆化。在本文件中,我们调查了在对抗性培训(AT)中,促进更深入了解能力、趋同、普遍化,特别是强力地过度适应经敌对性训练的分类师的记忆化效果。我们首先证明深层次网络有足够的能力,可以将敌对性培训数据用完全随机标签进行记忆化,但并非所有AT算法都能在极端情况下汇合。我们用随机标签对AT进行的研究促使进一步分析AT的趋同和概括化。我们发现,一些AT方法存在梯度不稳定问题,而最近提出的复杂措施无法通过考虑随机标签培训模型来解释稳健的概括化。此外,我们发现,在AT的记忆化中有一个重大的缺陷,即它可能导致强度过度适应。我们随后提出了一个新的缓解算法,其动机是详细的记忆化分析。关于各种数据集的广泛实验证实了拟议方法的有效性。