While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to accelerate adversarial training is to substitute multi-step attacks (e.g., PGD) with single-step attacks (e.g., FGSM). However, these single-step methods suffer from catastrophic overfitting, where the accuracy against PGD attack suddenly drops to nearly 0% during training, destroying the robustness of the networks. In this work, we study the phenomenon from the perspective of training instances. We show that catastrophic overfitting is instance-dependent and fitting instances with larger gradient norm is more likely to cause catastrophic overfitting. Based on our findings, we propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS). ATAS learns an instancewise adaptive step size that is inversely proportional to its gradient norm. The theoretical analysis shows that ATAS converges faster than the commonly adopted non-adaptive counterparts. Empirically, ATAS consistently mitigates catastrophic overfitting and achieves higher robust accuracy on CIFAR10, CIFAR100 and ImageNet when evaluated on various adversarial budgets.
翻译:虽然对抗性培训及其变式证明是抵御对抗性攻击的最有效算法,但其极其缓慢的培训过程使得难以推广成象图网这样的大型数据集。最近加快对抗性培训工作的关键想法是用单步攻击(如FGSM)取代多步攻击(如PGD),但这些单步方法受到灾难性的过度改造,在培训期间,对PGD攻击的准确性突然下降至近0%,破坏了网络的坚固性。在这项工作中,我们从培训实例的角度来研究这种现象。我们表明,灾难性的过度适应是依赖实例的,而采用较大梯度规范的合适情形更有可能造成灾难性的过度适应。根据我们的调查结果,我们提出了一个简单而有效的方法,即适应性升级规模的反向培训(ATAS)。ATAS学会了一种与其梯度规范反相称的适应性步骤大小。理论分析表明,AAS比通常采用的非适应性对应方更快。我们发现,灾难性过度适应性超时,ATASFAR100不断降低高的精确度,在高水平上持续降低IFAR10的精确度。