Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches to training robust models against such attacks. Unfortunately, this method is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration. By leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. To this end, we first provide convergence guarantees for adversarial coreset selection. In particular, we show that the convergence bound is directly related to how well our coresets can approximate the gradient computed over the entire training data. Motivated by our theoretical analysis, we propose using this gradient approximation error as our adversarial coreset selection objective to reduce the training set size effectively. Once built, we run adversarial training over this subset of the training data. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times while experiencing a slight degradation in the clean and robust accuracy.
翻译:神经网络很容易受到对抗性攻击: 增加精心设计的、 无法察觉的干扰, 从而改变其输入结果。 反向培训是训练强大模型以抵御这种攻击的最有效方法之一。 不幸的是, 这种方法比神经网络的香草培训慢得多, 因为需要在每个迭代中为整个培训数据构建对抗性实例。 我们利用核心设置选择理论, 展示如何选择一小撮培训数据, 提供一个原则性的方法, 以减少强力培训的时间复杂性。 为此, 我们首先为对抗性核心设置选择提供趋同保证。 特别是, 我们表明, 趋同与我们的核心设置如何能准确估计整个培训数据的梯度直接相关。 我们的理论分析促使我们提议使用这种斜度近似错误作为我们对抗性核心选择目标, 以有效缩小培训设定范围。 一旦建立起来, 我们为这组培训数据进行对抗性培训, 与现有方法不同, 我们的方法可以调整为广泛的培训目标, 包括贸易、 $ell_ pGPGD- PGD