A plethora of attack methods have been proposed to generate adversarial examples, among which the iterative methods have been demonstrated the ability to find a strong attack. However, the computation of an adversarial perturbation for a new data point requires solving a time-consuming optimization problem from scratch. To generate a stronger attack, it normally requires updating a data point with more iterations. In this paper, we show the existence of a meta adversarial perturbation (MAP), a better initialization that causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update, and propose an algorithm for computing such perturbations. We conduct extensive experiments, and the empirical results demonstrate that state-of-the-art deep neural networks are vulnerable to meta perturbations. We further show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
翻译:提出了大量攻击方法来生成对抗性例子,其中迭代方法已证明能够找到强力攻击。然而,计算新数据点的对称扰动需要从零开始解决一个耗时的优化问题。为了产生更强烈的攻击,通常需要用更多迭代来更新一个数据点。在本文中,我们显示存在一个对抗性对称扰动(MAP),一种更好的初始化,在仅通过一步梯度更新更新更新后,自然图像被错误地以高概率分类,并提议一种计算这种扰动的算法。我们进行了广泛的实验,而实验结果也表明,最先进的深神经网络很容易受到元扰动。我们进一步表明,这些扰动不仅具有图像-敏感性,而且具有模型-敏感性,作为单一的扰动性概观,贯穿了看不见的数据点和不同的神经网络结构。