PGD-based and FGSM-based are two popular adversarial training (AT) approaches for obtaining adversarially robust models. Compared with PGD-based AT, FGSM-based one is significantly faster but fails with catastrophic overfitting (CO). For mitigating CO in such Fast AT, there are two popular existing strategies: random start (RandStart) and Gradient Alignment (GradAlign). The former works only for a relatively small perturbation 8/255 with the l_\infty constraint, and GradAlign improves it by extending the perturbation size to 16/255 (with the l_\infty constraint) but at the cost of being 3 to 4 times slower. How to avoid CO in Fast AT for a large perturbation size but without increasing the computation overhead remains as an unsolved issue, for which our work provides a frustratingly simple (yet effective) solution. Specifically, our solution lies in just noise augmentation (NoiseAug) which is a non-trivial byproduct of simplifying GradAlign. By simplifying GradAlign we have two findings: (i) aligning logit instead of gradient in GradAlign requires half the training time but achieves higher performance than GradAlign; (ii) the alignment operation can also be removed by only keeping noise augmentation (NoiseAug). Simplified from GradAlign, our NoiseAug has a surprising resemblance with RandStart except that we inject noise on the image instead of perturbation. To understand why injecting noise to input prevents CO, we verify that this is caused not by data augmentation effect (inject noise on image) but by improved local linearity. We provide an intuitive explanation for why NoiseAug improves local linearity without explicit regularization. Extensive results demonstrate that our NoiseAug achieves SOTA results in FGSM AT. The code will be released after accepted.
翻译:基于 PGD 和基于 FGSM 的两种流行对抗性训练( AT) 方法, 以获得对抗性强的模型。 与基于 PGD 的 AT 相比, 以FGSM 为基础的方法明显更快, 但以灾难性的超配( CO) 失败为代价。 在这种快速的AT 中, 要降低COCO, 有两种流行的现有战略: 随机启动( RandStart ) 和 Gradient 匹配( GradAlign ) 。 前者只用于相对较小的扰动 8/ 255, 以及 GradAlign 改进它。 与基于 PGGT 的 16/ 255 ( 与 l& inty 限制) 相比, FGSDG 相比, 其渗透性能更快地改进它。 通过简化 Gradality ( NoAugAug ), 其改进的解决方案是仅靠噪音增强本地的噪音( ) 而不是简化 Grad Align Align Alignality 的 。 通过简化 Grabialalation 来降低 。