Deep learning models are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations on benign images. Many existing adversarial attack methods have achieved great white-box attack performance, but exhibit low transferability when attacking other models. Various momentum iterative gradient-based methods are shown to be effective to improve the adversarial transferability. In what follows, we propose an enhanced momentum iterative gradient-based method to further enhance the adversarial transferability. Specifically, instead of only accumulating the gradient during the iterative process, we additionally accumulate the average gradient of the data points sampled in the gradient direction of the previous iteration so as to stabilize the update direction and escape from poor local maxima. Extensive experiments on the standard ImageNet dataset demonstrate that our method could improve the adversarial transferability of momentum-based methods by a large margin of 11.1% on average. Moreover, by incorporating with various input transformation methods, the adversarial transferability could be further improved significantly. We also attack several extra advanced defense models under the ensemble-model setting, and the enhancements are remarkable with at least 7.8% on average.
翻译:众所周知,深层学习模式很容易受到通过在良性图像上添加人类无法察觉的扰动而形成的对抗性实例的影响。许多现有的对抗性攻击方法已经取得了巨大的白箱攻击性能,但在攻击其他模型时可转移性较低。各种动力迭代梯度方法已证明对改善对抗性转移有效。在下文中,我们提议一种增强动力迭代梯度方法,以进一步加强对抗性转移能力。具体地说,我们不仅在迭代过程中积累梯度,而且还在先前迭代的梯度方向上积累了抽样数据点的平均梯度,以稳定更新方向,并摆脱当地落后的峰值。关于标准图像网络数据集的广泛实验表明,我们的方法可以提高动力基方法的对抗性转移能力,平均大幅幅度为11.1%。 此外,通过采用各种投入转换方法,对抗性转移能力可以大大改进。我们还在混合模型设置下攻击了几个额外的高级防御模型,而且平均提高率至少达到7.8%。