Adversarial training (AT) is currently one of the most successful methods to obtain the adversarial robustness of deep neural networks. However, the phenomenon of robust overfitting, i.e., the robustness starts to decrease significantly during AT, has been problematic, not only making practitioners consider a bag of tricks for a successful training, e.g., early stopping, but also incurring a significant generalization gap in the robustness. In this paper, we propose an effective regularization technique that prevents robust overfitting by optimizing an auxiliary `consistency' regularization loss during AT. Specifically, we discover that data augmentation is a quite effective tool to mitigate the overfitting in AT, and develop a regularization that forces the predictive distributions after attacking from two different augmentations of the same instance to be similar with each other. Our experimental results demonstrate that such a simple regularization technique brings significant improvements in the test robust accuracy of a wide range of AT methods. More remarkably, we also show that our method could significantly help the model to generalize its robustness against unseen adversaries, e.g., other types or larger perturbations compared to those used during training. Code is available at https://github.com/alinlab/consistency-adversarial.
翻译:Aversarial Adversarial training(AT)是目前获得深神经网络对抗性强力的最成功方法之一,然而,强力超编现象,即强力超编现象在ATA期间开始明显减少,是一个问题,不仅使从业者为成功训练考虑一袋把戏,例如及早停止,而且还在强力方面造成很大的普遍差距。在本文件中,我们建议一种有效的正规化技术,通过优化AT期间辅助性“一贯性”的正规化损失,防止强力超编。具体地说,我们发现,数据扩编是减轻AT的过度装配的一个相当有效的工具,并发展一种正规化,迫使从同一事例的两种不同的增压中发出预测性分布,彼此相似。我们的实验结果表明,这种简单的正规化技术使广泛的AT方法的严格性测试性得到显著改进。更明显的是,我们的方法可以大大地帮助将其强力普遍化,例如,相对于在训练期间使用的那些标准,其他类型或更大的扰动性。