Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attacks, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that adversarial training using a conventional defense budget $\epsilon$ provably fails to provide test robustness in a simple statistical setting, where the non-robust features of the training data can be reinforced by $\epsilon$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness.
翻译:原本旨在抵制试验时对抗性例子的Aversarial培训显示在减少培训时间攻击方面很有希望。然而,本文对这种防御能力提出了挑战。我们发现一种名为稳定攻击的新的威胁模式,其目的是通过轻微地操纵培训数据来阻碍可靠地提供。在这种威胁下,我们表明,使用常规国防预算(美元)的对抗性培训在简单的统计环境中无法提供测试性强健性,因为在那里,培训数据的非野蛮性特征可以通过有限制的侵扰得到强化。此外,我们分析扩大国防预算以对付稳定攻击的必要性。最后,全面实验表明,稳定攻击对基准数据集有害,因此,适应性防御对于保持稳健是必要的。