Adversarial examples reveal the vulnerability and unexplained nature of neural networks. Studying the defense of adversarial examples is of considerable practical importance. Most adversarial examples that misclassify networks are often undetectable by humans. In this paper, we propose a defense model to train the classifier into a human-perception classification model with shape preference. The proposed model comprising a texture transfer network (TTN) and an auxiliary defense generative adversarial networks (GAN) is called Human-perception Auxiliary Defense GAN (HAD-GAN). The TTN is used to extend the texture samples of a clean image and helps classifiers focus on its shape. GAN is utilized to form a training framework for the model and generate the necessary images. A series of experiments conducted on MNIST, Fashion-MNIST and CIFAR10 show that the proposed model outperforms the state-of-the-art defense methods for network robustness. The model also demonstrates a significant improvement on defense capability of adversarial examples.
翻译:反对立实例揭示了神经网络的脆弱性和不可解释的性质。研究对抗性实例的防御具有相当大的实际重要性。多数对立实例表明网络分类错误往往无法被人类探测到。在本文中,我们提议了一个防御模型,将分类者训练成一个带有形状偏好的人类感知分类模型。拟议的模型包括质谱传输网络和辅助防御基因对抗网络(GAN),称为人类感知辅助防御GAN(HAD-GAN)。TN用来扩大清洁图像的纹理样本,帮助分类者关注其形状。GAN被用来形成模型的培训框架,并生成必要的图像。在MNIST、Fashion-MNIST和CIFAR10上进行的一系列实验表明,拟议的模型超越了网络坚固度方面最先进的防御方法。该模型还表明,对立性实例的防御能力有了显著改善。