It is well known that adversarial attacks can fool deep neural networks with imperceptible perturbations. Although adversarial training significantly improves model robustness, failure cases of defense still broadly exist. In this work, we find that the adversarial attacks can also be vulnerable to small perturbations. Namely, on adversarially-trained models, perturbing adversarial examples with a small random noise may invalidate their misled predictions. After carefully examining state-of-the-art attacks of various kinds, we find that all these attacks have this deficiency to different extents. Enlightened by this finding, we propose to counter attacks by crafting more effective defensive perturbations. Our defensive perturbations leverage the advantage that adversarial training endows the ground-truth class with smaller local Lipschitzness. By simultaneously attacking all the classes, the misled predictions with larger Lipschitzness can be flipped into correct ones. We verify our defensive perturbation with both empirical experiments and theoretical analyses on a linear model. On CIFAR10, it boosts the state-of-the-art model from 66.16% to 72.66% against the four attacks of AutoAttack, including 71.76% to 83.30% against the Square attack. On ImageNet, the top-1 robust accuracy of FastAT is improved from 33.18% to 38.54% under the 100-step PGD attack.
翻译:众所周知, 对抗性攻击可以以无法察觉的触动来愚弄深心神经网络。 尽管对抗性训练可以显著地改善模型强力, 但防御性失败案例仍然广泛存在。 在这项工作中,我们发现对抗性攻击也可能容易受到小扰动。 也就是说, 在对抗性训练模式中, 以小随机噪音干扰敌对性例子可能会使其错误的预测无效。 在仔细研究各种最先进的攻击之后, 我们发现所有这些攻击都有不同程度的缺陷。 通过这一发现, 我们提议通过设计更有效的防御性扰动来反击攻击。 我们的防御性攻击利用了对抗性训练使地面图象等级受到小小扰动的优势。 在对抗性对抗性攻击的模型中,通过同时攻击所有课程,可以将大 Lipschitz的错误预测翻转成正确的预测。 我们通过对线性模型的实验和理论分析来核实我们的防御性破坏。 在CIFAR10号中, 我们建议通过设计更有效的防御性攻击性攻击模式下的国家- 33% 式攻击性模型, 包括自动- 16 % 直方攻击性攻击性攻击性攻击性攻击性模型,从4 自动- 直方- 直方- 直方- 直方- 直方- 直方- 直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-直方-方-直方-直方-方-方-方-方-方-方-方-方-直方-直方-方-方-直方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-方-