Recent years have witnessed unprecedented success achieved by deep learning models in the field of computer vision. However, their vulnerability towards carefully crafted adversarial examples has also attracted the increasing attention of researchers. Motivated by the observation that adversarial examples are due to the non-robust feature learned from the original dataset by models, we propose the concepts of salient feature(SF) and trivial feature(TF). The former represents the class-related feature, while the latter is usually adopted to mislead the model. We extract these two features with coupled generative adversarial network model and put forward a novel detection and defense method named salient feature extractor (SFE) to defend against adversarial attacks. Concretely, detection is realized by separating and comparing the difference between SF and TF of the input. At the same time, correct labels are obtained by re-identifying SF to reach the purpose of defense. Extensive experiments are carried out on MNIST, CIFAR-10, and ImageNet datasets where SFE shows state-of-the-art results in effectiveness and efficiency compared with baselines. Furthermore, we provide an interpretable understanding of the defense and detection process.
翻译:近些年来,通过计算机视觉领域的深层次学习模式取得了前所未有的成功,然而,他们容易成为精心设计的对抗性实例,也引起了研究人员越来越多的关注。从观察中发现,对抗性实例是由于从模型的原始数据集中吸取的非野蛮特征,我们提出了突出特征和次要特征的概念。前者代表了与阶级有关的特征,而后者通常用于误导模型。我们从这两个特征中提取了配有配有基因对抗性网络模型的配对性网络模型,并提出了称为突出特征提取器(SFE)的新颖的探测和防御方法,以抵御对抗性攻击。具体地说,通过分离和比较投入的SF和TF之间的差别来发现对抗性实例。与此同时,我们通过重新确定SF获得正确的标签,以达到防御目的。对MNIST、CIFAR-10和图像网络数据集进行了广泛的实验,SFE展示了与基线相比在效力和效率方面的最新成果。我们提供了对国防和探测过程的可解释的理解。