Deep neural networks are vulnerable to adversarial examples, which are crafted by adding human-imperceptible perturbations to original images. Most existing adversarial attack methods achieve nearly 100% attack success rates under the white-box setting, but only achieve relatively low attack success rates under the black-box setting. To improve the transferability of adversarial examples for the black-box setting, several methods have been proposed, e.g., input diversity, translation-invariant attack, and momentum-based attack. In this paper, we propose a method named Gradient Refining, which can further improve the adversarial transferability by correcting useless gradients introduced by input diversity through multiple transformations. Our method is generally applicable to many gradient-based attack methods combined with input diversity. Extensive experiments are conducted on the ImageNet dataset and our method can achieve an average transfer success rate of 82.07% for three different models under single-model setting, which outperforms the other state-of-the-art methods by a large margin of 6.0% averagely. And we have applied the proposed method to the competition CVPR 2021 Unrestricted Adversarial Attacks on ImageNet organized by Alibaba and won the second place in attack success rates among 1558 teams.
翻译:深心神经网络很容易受到对抗性例子的伤害,这些例子是通过在原始图像中添加人为不易察觉的扰动而设计的。大多数现有的对抗性攻击方法在白箱设置下达到近100%攻击成功率,但在黑箱设置下只达到相对较低的攻击成功率。为了提高黑箱设置的对抗性例子的可转让性,已经提出了几种方法,例如输入多样性、翻译性反动攻击和动力攻击。在本文中,我们提出了一个名为“梯度调整”的方法,它可以通过纠正通过多重变换输入多样性引入的无用的梯度来进一步改善对抗性转移性。我们的方法一般适用于许多基于梯度的攻击方法,加上输入多样性。对图像网络数据集进行了广泛的实验,我们的方法可以达到三种不同模型的平均转移率82.07%,这三种模型以6.0%的平均幅度取代了其他状态的艺术方法。我们还将拟议的方法应用到CVPR 2021 Anrestrab Adversarial攻击队的Adversarial 15Adversaribal 攻击队的Alibal成功率。