Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attacks, which aims to hinder robust availability by slightly perturbing the training data. Under this threat, we find that adversarial training using a conventional defense budget $\epsilon$ provably fails to provide test robustness in a simple statistical setting when the non-robust features of the training data are reinforced by $\epsilon$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness.
翻译:原本旨在抵制测试时对抗性辩论实例的Aversari培训在减少培训可用时间攻击方面很有希望。 但是,本文对这种防御能力提出了挑战。 我们发现一种名为稳定攻击的新的威胁模式,其目的是通过略微干扰培训数据来阻碍可靠提供。 在这种威胁下,我们发现,使用常规国防预算(美元)的对抗性培训无法在一个简单的统计环境中提供测试性强健性,因为培训数据的非野蛮特征因受美元限制的扰动而得到加强。此外,我们分析扩大国防预算以对付稳定攻击的必要性。最后,全面实验表明,稳定攻击对基准数据集有害,因此,适应性防御对于保持稳健是必要的。