Recently, FGSM adversarial training is found to be able to train a robust model which is comparable to the one trained by PGD but an order of magnitude faster. However, there is a failure mode called catastrophic overfitting (CO) that the classifier loses its robustness suddenly during the training and hardly recovers by itself. In this paper, we find CO is not only limited to FGSM, but also happens in $\mbox{DF}^{\infty}$-1 adversarial training. Then, we analyze the geometric properties for both FGSM and $\mbox{DF}^{\infty}$-1 and find they have totally different decision boundaries after CO. For FGSM, a new decision boundary is generated along the direction of perturbation and makes the small perturbation more effective than the large one. While for $\mbox{DF}^{\infty}$-1, there is no new decision boundary generated along the direction of perturbation, instead the perturbation generated by $\mbox{DF}^{\infty}$-1 becomes smaller after CO and thus loses its effectiveness. We also experimentally analyze three hypotheses on potential factors causing CO. And then based on the empirical analysis, we modify the RS-FGSM by not projecting perturbation back to the $l_\infty$ ball. By this small modification, we could achieve $47.56 \pm 0.37\% $ PGD-50-10 accuracy on CIFAR10 with $\epsilon=8/255$ in contrast to $43.57 \pm 0.30\% $ by RS-FGSM and also further extend the working range of $\epsilon$ from 8/255 to 11/255 on CIFAR10 without CO occurring.
翻译:最近,FGSM的对抗性培训被发现能够训练出一种强健的模型,该模型与PGD培训的模型相似,但规模要快得多。然而,有一种称为灾难性过度的失败模式(CO),即分类器在培训过程中突然失去稳健性,几乎无法自行恢复。在本文中,我们发现CO不仅局限于FGSM,而且还发生在$mbox{DF ⁇ infty}-1的对抗性培训中。然后,我们分析FGSM和$mbox{DF}-1的几何特性,发现它们与CO有完全不同的决定界限。对于FGSM来说,沿着扰动方向产生了一个新的决定边界,使小的扰动效果比大方向更有效。虽然对于$mbox{DF}1,在渗透性培训方向上并没有产生新的决定界限,而对于美元GFSM=0.50美元和$0.30FFF的周期性变化则会变得更小一些。我们用CO的数值来修正它,然后通过实验性地分析RSF=RF的数值,然后通过实验性分析。