Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversarial gradient directions, a phenomenon known as `gradient masking'. In this work, we analyse the effect of label smoothing on adversarial training as one of the potential causes of gradient masking. We then develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA). Our attack approach is based on a `match and deceive' loss that finds optimal adversarial directions through guidance from a surrogate model. Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size. Furthermore, our proposed G-PGA is generic, thus it can be combined with an ensemble attack strategy as we demonstrate for the case of Auto-Attack, leading to efficiency and convergence speed improvements. More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
翻译:对抗性训练是一种有效的方法,可以使深心神经网络在对抗性攻击时变得强大。最近,提出了不同的对抗性训练辩护,不仅保持高度干净的准确性,而且显示对广受和研究周密的对抗性攻击(如PGD)的有力性。如果攻击未能找到对抗性梯度方向,即所谓的“梯度遮罩”现象,那么还会出现高度的对抗性强健性攻击。在这项工作中,我们分析将光滑的标签对对抗性训练的影响作为梯度遮盖的潜在原因之一。然后,我们开发一种引导性机制,以避免在攻击最优化期间出现地方迷你,从而导致一场新颖的假指向导的“引导性预测性梯度攻击 ” 。我们的攻击方法基于一种“匹配性和欺骗性”的损失,这种损失通过一种隐蔽模型的指导而找到最佳的对抗性对抗性攻击方向。我们的调整式攻击并不需要随机重现、大量攻击性重复或寻找最佳的分级规模。此外,我们提议的G-PG-PGA是通用的,因此可以与共同攻击战略组合式攻击战略相结合,因为我们演示了自动-A-A-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-HD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S