Recently, Wong et al. showed that adversarial training with single-step FGSM leads to a characteristic failure mode named catastrophic overfitting (CO), in which a model becomes suddenly vulnerable to multi-step attacks. They showed that adding a random perturbation prior to FGSM (RS-FGSM) seemed to be sufficient to prevent CO. However, Andriushchenko and Flammarion observed that RS-FGSM still leads to CO for larger perturbations, and proposed an expensive regularizer (GradAlign) to avoid CO. In this work, we methodically revisit the role of noise and clipping in single-step adversarial training. Contrary to previous intuitions, we find that using a stronger noise around the clean sample combined with not clipping is highly effective in avoiding CO for large perturbation radii. Based on these observations, we then propose Noise-FGSM (N-FGSM) that, while providing the benefits of single-step adversarial training, does not suffer from CO. Empirical analyses on a large suite of experiments show that N-FGSM is able to match or surpass the performance of previous single-step methods while achieving a 3$\times$ speed-up.
翻译:最近,Wong等人指出,使用单一步骤的FGSM的对抗性训练导致一个称为灾难性过度装配(CO)的典型失败模式,在这个模式中,一个模型突然容易受到多步攻击;它们表明,在FGSM(RS-FGSM)之前添加随机扰动似乎足以防止CO;然而,Andriushchenko和Flammarion指出,RS-FGSM仍然会导致更大的扰动,并提议采用昂贵的常规化器(GradAlign)以避免CO。在这项工作中,我们有条不紊乱地重新审视单步对抗训练中噪音和剪动的作用。与以往的直觉相反,我们发现在使用清洁样品加固的噪音,加上不剪动,对于大型扰动性辐射避免CO是极为有效的。根据这些观察,我们然后建议Nise-FGSMM(N-FGSM)在提供单步对抗性训练的好处的同时,不因CO而受损失。关于大型实验套套件的磁分析显示N-GSMM在达到前一个速度时能够匹配或超过前一个速度。