Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients, a type of gradient masking, have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We also propose a MultiTargeted and an ensemble version of our MD attack. By investigating 17 defense models proposed since 2018, we find that 6 models are susceptible to imbalanced gradients and our MD attack can decrease their robustness evaluated by the best baseline standalone attack by another 2%. We also provide an in-depth analysis of the likely causes of imbalanced gradients and effective countermeasures.
翻译:评估防御模型的稳健性是对抗性稳健性研究中一项具有挑战性的任务。 在许多防御方法中,曾经发现存在一种易腐化的梯度(一种梯度遮盖),这种梯度(一种梯度遮盖)曾存在许多梯度,并造成一个虚假的稳健性信号。 在本文件中,我们发现了一种更微妙的形势,即“平衡梯度梯度”(Im平衡的梯度),它也可能造成过高估计的对抗性强健性。 当一个差值丧失的梯度梯度的梯度主导并将攻击推向一个不优化的方向时,就会出现不平衡的梯度现象。 为了利用不平衡的梯度,我们设计了一种马林分解(MD)式攻击(MD),将差值损失分解为个别条件,然后通过两阶段进程分别探索这些词的可攻击性。 我们还提出了多目标,并提出了我们MD攻击的混合版本。 通过调查2018年以来提出的17个防御模型,我们发现6个模型容易出现不平衡的梯度梯度,而我们的MD型攻击会降低其由另一个2%的最佳基线独立攻击所评价的精准性。 我们还深入分析了不平衡性。 我们还对不平衡性梯度和有效反措施的可能原因进行了深入分析。