Standard adversarial training approaches suffer from robust overfitting where the robust accuracy decreases when models are adversarially trained for too long. The origin of this problem is still unclear and conflicting explanations have been reported, i.e., memorization effects induced by large loss data or because of small loss data and growing differences in loss distribution of training samples as the adversarial training progresses. Consequently, several mitigation approaches including early stopping, temporal ensembling and weight perturbations on small loss data have been proposed to mitigate the effect of robust overfitting. However, a side effect of these strategies is a larger reduction in clean accuracy compared to standard adversarial training. In this paper, we investigate if these mitigation approaches are complimentary to each other in improving adversarial training performance. We further propose the use of helper adversarial examples that can be obtained with minimal cost in the adversarial example generation, and show how they increase the clean accuracy in the existing approaches without compromising the robust accuracy.
翻译:标准对抗性培训方法因强力的准确性下降而难以适应,因为当模型接受对抗性培训的时间过长时,强力的准确性就会下降; 这一问题的根源仍然不明确,而且报告的解释相互矛盾,即大量损失数据或由于小额损失数据引起的记忆效应,以及随着对抗性培训的进展,培训样本在损失分布上的差异越来越大; 因此,提出了几种缓解方法,包括早期停用、时间组合和对小型损失数据重量的扰动,以减轻稳性过度的影响; 然而,这些战略的一个副作用是,与标准的对抗性培训相比,清洁准确性下降幅度更大; 在本文件中,我们调查这些缓解方法在改善对抗性培训绩效方面是否彼此互补; 我们进一步提议使用在对抗性示范生成过程中以最低成本获得的辅助者对抗性辩论性实例,并表明这些方法如何提高现有方法的清洁准确性,同时又不损害稳健的准确性。