Adversarial examples are inputs intentionally generated for fooling a deep neural network. Recent studies have proposed unrestricted adversarial attacks that are not norm-constrained. However, the previous unrestricted attack methods still have limitations to fool real-world applications in a black-box setting. In this paper, we present a novel method for generating unrestricted adversarial examples using GAN where an attacker can only access the top-1 final decision of a classification model. Our method, Latent-HSJA, efficiently leverages the advantages of a decision-based attack in the latent space and successfully manipulates the latent vectors for fooling the classification model. With extensive experiments, we demonstrate that our proposed method is efficient in evaluating the robustness of classification models with limited queries in a black-box setting. First, we demonstrate that our targeted attack method is query-efficient to produce unrestricted adversarial examples for a facial identity recognition model that contains 307 identities. Then, we demonstrate that the proposed method can also successfully attack a real-world celebrity recognition service.
翻译:Aversarial 实例是故意为愚弄深层神经网络而生成的投入。最近的研究提出了不受限制的对抗性攻击,这些攻击并非不受规范约束。然而,以前的无限制攻击方法仍然有限制,在黑箱设置中可以愚弄真实世界应用。在本文中,我们提出了一个使用GAN生成无限制对抗性例子的新颖方法,攻击者只能进入分类模型的上一级和终级决定。我们的方法,即Lebent-HSJA,有效地利用潜空中基于决定的攻击的优势,并成功地操纵潜在矢量来欺骗分类模型。通过广泛的实验,我们证明我们所提议的方法在评估黑箱设置中查询有限的分类模型的稳健性方面是有效的。首先,我们证明我们的目标攻击方法具有查询效率,为包含307个身份的面部识别模型制作不受限制的对抗性例子。然后,我们证明拟议的方法也可以成功地攻击真实世界名人识别服务。