Hundreds of defenses have been proposed to make deep neural networks robust against minimal (adversarial) input perturbations. However, only a handful of these defenses held up their claims because correctly evaluating robustness is extremely challenging: Weak attacks often fail to find adversarial examples even if they unknowingly exist, thereby making a vulnerable network look robust. In this paper, we propose a test to identify weak attacks, and thus weak defense evaluations. Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample. Consequentially, any correct attack must succeed in breaking this modified network. For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We hope that attack unit tests - such as ours - will be a major component in future robustness evaluations and increase confidence in an empirical field that is currently riddled with skepticism.
翻译:数以百计的防御计划被提出来使深层神经网络在最小(对抗性)输入干扰下变得强大。 然而,只有一小部分这些防御计划因为正确评估强健性极具挑战性而坚持了它们的要求:较弱的攻击往往无法找到对抗性的例子,即使它们并非在不知情的情况下存在,从而使脆弱的网络看起来更加强大。在本文中,我们建议进行一项测试,以识别薄弱的攻击,从而进行软弱的防御评估。我们的测试略微修改了神经网络,以保证每个样本都有一个对抗性的例子存在。因此,任何正确的攻击都必须成功地打破这个经过修改的网络。对于13个以前公布的防御计划中的11个来说,最初对国防的评估都失败了我们的测试,而打破这些防御的更强的攻击却通过它。 我们希望,像我们这样的攻击单位的测试将成为未来稳健性评价的一个主要组成部分,并增加目前充满怀疑的实验领域的信心。