Deep neural networks are easily misled by adversarial examples. Although lots of defense methods are proposed, many of them are demonstrated to lose effectiveness when against properly performed adaptive attacks. How to evaluate the adversarial robustness effectively is important for the realistic deployment of deep models, but yet still unclear. To provide a reasonable solution, one of the primary things is to understand the error (or gap) between the true adversarial robustness and the evaluated one, what is it and why it exists. Several works are done in this paper to make it clear. Firstly, we introduce an interesting phenomenon named gradient traps, which lead to incompetent adversaries and are demonstrated to be a manifestation of evaluation error. Then, we analyze the error and identify that there are three components. Each of them is caused by a specific compromise. Moreover, based on the above analysis, we present our evaluation suggestions. Experiments on adversarial training and its variations indicate that: (1) the error does exist empirically, and (2) these defenses are still vulnerable. We hope these analyses and results will help the community to develop more powerful defenses.
翻译:深心神经网络很容易被对抗性实例误导。 虽然提出了许多防御方法, 但许多这些方法都证明在适当进行适应性攻击时会失去效力。 如何有效评估对抗性强力对于实际部署深型模型很重要, 但还是不清楚。 要提供合理的解决方案, 首要的事情之一是理解真正的对抗性强力与被评估的强力之间的错误( 或差距 ), 是什么及其存在的原因。 本文中做了一些工作来澄清这一点。 首先, 我们引入了一个有趣的现象, 叫做梯度陷阱, 导致对手不称职, 并被证明是评价错误的表现。 然后, 我们分析错误, 并找出其中的三个组成部分。 每个组成部分都是具体妥协的结果。 此外, 根据上述分析, 我们提出评估建议。 对抗性训练及其变化的实验表明:(1) 错误确实存在经验性, 并且(2) 这些防御手段仍然脆弱。 我们希望这些分析和结果将有助于社区发展更强大的防御手段。