Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We also propose a multi-targeted and ensemble version of our MD attack. By investigating 24 defense models proposed since 2018, we find that 11 models are susceptible to a certain degree of imbalanced gradients and our MD attack can decrease their robustness evaluated by the best standalone baseline attack by more than 1%. We also provide an in-depth investigation on the likely causes of imbalanced gradients and effective countermeasures. Our code is available at https://github.com/HanxunH/MDAttack.
翻译:在对抗鲁棒性研究中,评估防御模型的鲁棒性是一项具有挑战性的任务。先前已经发现混淆的梯度存在于许多防御方法中,并会引起鲁棒性的假信号。在本文中,我们确定了一个更加微妙的情况,称为不平衡梯度,这也可能会导致高估的对抗鲁棒性。不平衡梯度现象发生在边际损失的一个术语的梯度支配并将攻击推向次优方向时。为了利用不平衡梯度,我们制定了一种边际分解(MD)攻击,该攻击将边际损失分解为单个术语,然后通过两个阶段的过程分别探索这些术语的攻击性。我们还提出了我们的MD攻击的多目标和集成版本。通过调查自2018年以来提出的24个防御模型,我们发现11个模型在某种程度上容易受到不平衡梯度的影响,而我们的MD攻击可以将它们的鲁棒性评估降低1%以上,由最佳独立基线攻击评估而言。我们还就不平衡梯度的可能原因和有效对策进行了深入探讨。我们的代码可在https://github.com/HanxunH/MDAttack上获得。