Adversarial training (AT) is a widely recognized defense mechanism to gain the robustness of deep neural networks against adversarial attacks. It is built on min-max optimization (MMO), where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the conventional MMO method makes AT hard to scale. Thus, Fast-AT and other recent algorithms attempt to simplify MMO by replacing its maximization step with the single gradient sign-based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its empirical performance is unsatisfactory due to the issue of robust catastrophic overfitting when training with strong adversaries. In this paper, we advance Fast-AT from the fresh perspective of bi-level optimization (BLO). We first show that the commonly-used Fast-AT is equivalent to using a stochastic gradient algorithm to solve a linearized BLO problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Inspired by BLO, we design and analyze a new set of robust training algorithms termed Fast Bi-level AT (Fast-BAT), which effectively defends sign-based projected gradient descent (PGD) attacks without using any gradient sign method or explicit robust regularization. In practice, we show that our method yields substantial robustness improvements over multiple baselines across multiple models and datasets. All code for reproducing the experiments in this paper is at https://github.com/NormalUhr/Fast_BAT.
翻译:Adversarial 培训(AT) 是一种得到广泛承认的防御机制,目的是在对抗性攻击时获得深层神经网络的稳健性能。它建立在微量最大优化(MMO)上,最弱者(即捍卫者)寻求一种强健的模式,以尽量减少最坏的培训损失,而面对由最强者(即攻击者)设计的对抗性实例。然而,传统的MMO方法使AT难以达到规模。因此,快速AT和其他最近的算法试图通过单一梯度标志性标志性攻击生成步骤来取代其最大程度的升级步骤来简化MMO。尽管实施起来容易,但FAST-AT缺乏理论保证,其经验性能不尽如实,因为在培训对手时,强力灾难性性格超强。我们从双级优化(即攻击者)的新角度推进快速AT。我们首先表明,常用的快速AT相当于使用一种精度梯度梯度的梯度计算方法来解决涉及标志性操作的直线性BLOirl/CRUT。然而,由于标志性操作的离散性质使得信号操作难以在不易地进行,因此难以在不易地解释性操作上有效地对ATRAAT 解释性攻击进行。我们是如何设计一个快速的压性评估。