Adversarial training (AT) is currently one of the most successful methods to obtain the adversarial robustness of deep neural networks. However, the phenomenon of robust overfitting, i.e., the robustness starts to decrease significantly during AT, has been problematic, not only making practitioners consider a bag of tricks for a successful training, e.g., early stopping, but also incurring a significant generalization gap in the robustness. In this paper, we propose an effective regularization technique that prevents robust overfitting by optimizing an auxiliary 'consistency' regularization loss during AT. Specifically, it forces the predictive distributions after attacking from two different augmentations of the same instance to be similar with each other. Our experimental results demonstrate that such a simple regularization technique brings significant improvements in the test robust accuracy of a wide range of AT methods. More remarkably, we also show that our method could significantly help the model to generalize its robustness against unseen adversaries, e.g., other types or larger perturbations compared to those used during training. Code is available at https://github.com/alinlab/consistency-adversarial.
翻译:反向培训(AT)是目前获得深神经网络对抗性强力的最成功方法之一,然而,强力超编现象,即强力在AT期间开始大幅下降,一直是一个问题,不仅使从业者为成功培训考虑一包把戏,例如及早停止,而且还在强力方面造成巨大的普遍差距。在本文件中,我们建议一种有效的正规化技术,通过优化AT期间辅助性“一致性”的正规化损失,防止强力超编。具体来说,它迫使同一情况下两个不同的增压进行预测性分发,彼此相似。我们的实验结果表明,这种简单的正规化技术在测试广泛的AT方法的稳健准确性方面带来重大改进。更明显的是,我们还表明,我们的方法可以极大地帮助推广其针对隐性对手的强力,例如,其他类型或比培训期间使用的更严重的扰动性。《守则》可在https://github.com/alinlab/consiticent-trib)查阅。