Adversarial examples pose many security threats to convolutional neural networks (CNNs). Most defense algorithms prevent these threats by finding differences between the original images and adversarial examples. However, the found differences do not contain features about the classes, so these defense algorithms can only detect adversarial examples without recovering the correct labels. In this regard, we propose the Adversarial Feature Genome (AFG), a novel type of data that contains both the differences and features about classes. This method is inspired by an observed phenomenon, namely the Adversarial Feature Separability (AFS), where the difference between the feature maps of the original images and adversarial examples becomes larger with deeper layers. On top of that, we further develop an adversarial example recognition framework that detects adversarial examples and can recover the correct labels. In the experiments, the detection and classification of adversarial examples by AFGs has an accuracy of more than 90.01\% in various attack scenarios. To the best of our knowledge, our method is the first method that focuses on both attack detecting and recovering. AFG gives a new data-driven perspective to improve the robustness of CNNs. The source code is available at https://github.com/GeoX-Lab/Adv_Fea_Genome.
翻译:反对立实例对进化神经网络构成许多安全威胁。大多数防御算法通过发现原始图像和对抗性实例之间的差异来防止这些威胁。然而,发现的差异并不包含类别特征,因此这些防御算法只能发现对抗性实例,而不会恢复正确的标签。在这方面,我们建议采用反反对地功能基因组(AFG),这是一种新型数据,包含不同类别的差异和特征。这一方法受到观察到的现象的启发,即反对地功能分离(AFS),即原始图像和对抗性实例的特征图和对抗性实例之间的差异随着更深层次而扩大。此外,我们进一步开发了一种对抗性实例识别框架,以探测对抗性实例并恢复正确的标签。在实验中,AFGs对对抗性实例的检测和分类准确度在各种攻击情景中超过90.01 ⁇ 。据我们所知,我们的方法是第一种以攻击探测和恢复为重点的方法。AFGG/GFA的源码提供了一个新的数据驱动视角,以提高CNNA/GAFA的可靠度。