Adversarial training (AT) is currently one of the most successful methods to obtain the adversarial robustness of deep neural networks. However, a significant generalization gap in the robustness obtained from AT has been problematic, making practitioners to consider a bag of tricks for a successful training, e.g., early stopping. In this paper, we investigate data augmentation (DA) techniques to address the issue. In contrast to the previous reports in the literature that DA is not effective for regularizing AT, we discover that DA can mitigate overfitting in AT surprisingly well, but they should be chosen deliberately. To utilize the effect of DA further, we propose a simple yet effective auxiliary 'consistency' regularization loss to optimize, which forces predictive distributions after attacking from two different augmentations to be similar to each other. Our experimental results demonstrate that our simple regularization scheme is applicable for a wide range of AT methods, showing consistent yet significant improvements in the test robust accuracy. More remarkably, we also show that our method could significantly help the model to generalize its robustness against unseen adversaries, e.g., other types or larger perturbations compared to those used during training. Code is available at https://github.com/alinlab/consistency-adversarial.
翻译:Aversarial Adversarial 培训(AT)是目前获得深神经网络对抗性强力的最成功方法之一,然而,从ATT获得的强力方面的巨大普遍化差距一直存在问题,使从业者考虑一套成功培训的技巧,例如及早停止。在本文中,我们调查了数据增强(DA)技术,以解决这一问题。与文献中的以往报告相比,DA对使AT系统正规化没有效力,我们发现DA可以令人惊讶地减少过度适应的情况,但应该谨慎选择。为了进一步利用DA的效果,我们建议优化简单而有效的辅助性“一致性”正规化损失,这迫使从两种不同的增强中进行攻击后预测性分布,彼此相似。我们的实验结果显示,我们简单的正规化计划适用于广泛的AT方法,表明测试的稳健性得到了一致但显著的改进。更明显的是,我们的方法可以极大地帮助推广其对抗看不见的强力模式,例如,其他类型或比培训期间使用的防御性/防御性更强的更强的模型。在 https/combas/conniverstition/conniverstition中可以查阅。