In the evasion attacks against deep neural networks (DNN), the attacker generates adversarial instances that are visually indistinguishable from benign samples and sends them to the target DNN to trigger misclassifications. In this paper, we propose a novel multi-view adversarial image detector, namely Argos, based on a novel observation. That is, there exist two "souls" in an adversarial instance, i.e., the visually unchanged content, which corresponds to the true label, and the added invisible perturbation, which corresponds to the misclassified label. Such inconsistencies could be further amplified through an autoregressive generative approach that generates images with seed pixels selected from the original image, a selected label, and pixel distributions learned from the training data. The generated images (i.e., the "views") will deviate significantly from the original one if the label is adversarial, demonstrating inconsistencies that Argos expects to detect. To this end, Argos first amplifies the discrepancies between the visual content of an image and its misclassified label induced by the attack using a set of regeneration mechanisms and then identifies an image as adversarial if the reproduced views deviate to a preset degree. Our experimental results show that Argos significantly outperforms two representative adversarial detectors in both detection accuracy and robustness against six well-known adversarial attacks. Code is available at: https://github.com/sohaib730/Argos-Adversarial_Detection
翻译:在对深层神经网络(DNN)的规避攻击中,攻击者产生了与良性样本无法分辨的对抗性实例,并将之发送给目标DNN,以触发错误分类。在本文中,我们根据新颖的观察,提出一个新的多视图对抗性图像检测器,即Argos。也就是说,在对立性实例中存在两种“souls”,即与真实标签相对应的视觉不变的内容,以及与错误分类标签相对应的附加的无形扭曲性。这种不一致之处可以通过自动递增的基因化方法进一步放大,该方法可以产生从原始图像中选取的种子像素的图像、选定的标签和从培训数据中学得的像素分布。生成的图像(即“views”)将大大偏离原始图像,如果标签是对抗性标签,则表明Argos要检测的不一致。至此目的,Argos首先放大图像的视觉内容与由攻击引起的错误分类的Argoalityality 方法,然后用一个可复制的测试性标度标度标度标度标度标度确定,然后将标度标度标度标度显示我们的测试前的两种图像。