Robustness to adversarial attack is typically evaluated with adversarial accuracy. This metric is however too coarse to properly capture all robustness properties of machine learning models. Many defenses, when evaluated against a strong attack, do not provide accuracy improvements while still contributing partially to adversarial robustness. Popular certification methods suffer from the same issue, as they provide a lower bound to accuracy. To capture finer robustness properties we propose a new metric for L2 robustness, adversarial angular sparsity, which partially answers the question "how many adversarial examples are there around an input". We demonstrate its usefulness by evaluating both "strong" and "weak" defenses. We show that some state-of-the-art defenses, delivering very similar accuracy, can have very different sparsity on the inputs that they are not robust on. We also show that some weak defenses actually decrease robustness, while others strengthen it in a measure that accuracy cannot capture. These differences are predictive of how useful such defenses can become when combined with adversarial training.
翻译:对抗性攻击的强度通常以对抗性攻击的准确性来评估。 但是,这一指标太粗糙,无法正确捕捉机器学习模型的所有强度特性。 许多防御在被强烈攻击时没有提供准确性改进,但仍然部分地有助于对抗性强。 普通认证方法也存在同样的问题,因为它们提供了较低的准确性约束。 为了捕捉较强的强度特性,我们提出了一个新的L2强度、对抗性角宽度指标,其中部分回答了“ 投入周围有多少对抗性例子” 的问题。 我们通过评估“ 强” 和“ 弱” 的防御来证明其有用性。 我们通过评估“ 强” 和“ 弱” 的防御来证明它有用。 我们显示,一些最先进的防御, 提供非常相似的准确性, 能够对它们不强的投入产生非常不同的宽度。 我们还表明,一些弱的防御实际上降低了稳度,而另一些则在精确性无法捕到的程度上加强了它。 这些差异可以预测在与对抗性训练相结合时,这种防御会变得多么有用。