Deep neural networks are vulnerable to adversarial examples that are crafted by imposing imperceptible changes to the inputs. However, these adversarial examples are most successful in white-box settings where the model and its parameters are available. Finding adversarial examples that are transferable to other models or developed in a black-box setting is significantly more difficult. In this paper, we propose the Direction-Aggregated adversarial attacks that deliver transferable adversarial examples. Our method utilizes aggregated direction during the attack process for avoiding the generated adversarial examples overfitting to the white-box model. Extensive experiments on ImageNet show that our proposed method improves the transferability of adversarial examples significantly and outperforms state-of-the-art attacks, especially against adversarial robust models. The best averaged attack success rates of our proposed method reaches 94.6\% against three adversarial trained models and 94.8\% against five defense methods. It also reveals that current defense approaches do not prevent transferable adversarial attacks.
翻译:深神经网络容易受到通过对投入进行无法察觉的改动而形成的对抗性实例的伤害。 但是,这些对抗性实例在模型及其参数可以利用的白箱环境中最为成功。 找到可转移到其他模型或在黑箱环境中开发的对抗性实例要困难得多。 在本文中,我们提出了提供可转移对抗性实例的 " 定向集中式对立性对立性攻击 " 方案。 我们的方法在攻击过程中使用总合方向避免产生的对抗性实例与白箱模型相配。 在图像网上的广泛实验表明,我们所提议的方法大大改善了对抗性实例的可转让性,并且超过了最先进的攻击,特别是对抗性强性强性攻击模式的可转让性。 我们拟议方法的最佳平均攻击率达到94.6 ⁇ 对三种经过对抗性训练的模型和94.8 ⁇ 对五种防御方法的打击性攻击率。 它还表明,目前的防御方法并不防止可转移的对抗性对立性攻击。