Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches in training robust models against such attacks. However, it is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration, hampering its effectiveness. Recently, Fast Adversarial Training (FAT) was proposed that can obtain robust models efficiently. However, the reasons behind its success are not fully understood, and more importantly, it can only train robust models for $\ell_\infty$-bounded attacks as it uses FGSM during training. In this paper, by leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a general, more principled approach toward reducing the time complexity of robust training. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training (PAT). Our experimental results indicate that our approach speeds up adversarial training by 2-3 times while experiencing a slight reduction in the clean and robust accuracy.
翻译:神经网络很容易受到对抗性攻击:增加设计良好、无法察觉到的干扰,其投入可能会改变其产出。反向培训是培训抵御这类攻击的强大模型的最有效方法之一。然而,它比神经网络的香草培训慢得多,因为它需要在每个循环中为整个培训数据建立对抗性实例,从而妨碍其效力。最近,快速反向培训(FAT)被提出,能够有效地获得稳健模型。然而,它的成功背后的原因尚未得到充分理解,更重要的是,它只能为在培训中使用FGSM时,为受美元约束的攻击训练强有力的模型。在本文中,通过利用核心选择理论,我们展示了如何选择少量的培训数据为降低稳健培训的时间复杂性提供一般性的、更具原则性的方法。与现有方法不同,我们的方法可以适应广泛的培训目标,包括TraightS, $\ell_p$_p$-PGPGD, 以及概念性反向培训(PAT)等。我们的实验结果显示,我们通过简单的对抗性培训来快速地减少我们的方法。