It is broadly known that deep neural networks are susceptible to being fooled by adversarial examples with perturbations imperceptible by humans. Various defenses have been proposed to improve adversarial robustness, among which adversarial training methods are most effective. However, most of these methods treat the training samples independently and demand a tremendous amount of samples to train a robust network, while ignoring the latent structural information among these samples. In this work, we propose a novel Local Structure Preserving (LSP) regularization, which aims to preserve the local structure of the input space in the learned embedding space. In this manner, the attacking effect of adversarial samples lying in the vicinity of clean samples can be alleviated. We show strong empirical evidence that with or without adversarial training, our method consistently improves the performance of adversarial robustness on several image classification datasets compared to the baselines and some state-of-the-art approaches, thus providing promising direction for future research.
翻译:众所周知,深度神经网络容易被不可察觉的扰动所欺骗,这被称为对抗样本。各种防御方法已被提出来提高对抗鲁棒性,其中对抗训练方法最为有效。然而,大多数这些方法都独立对待训练样本,并需要大量的样本才能训练一个鲁棒的网络,从而忽略了这些样本之间的潜在结构信息。在这个工作中,我们提出了一种新颖的保持局部结构的正则化方法,旨在保持输入空间在学习嵌入空间的局部结构。通过这种方式,对于那些位于干净样本附近的对抗样本所带来的攻击作用将会被减轻。我们显示出强有力的实证证据,无论是否进行对抗训练,我们的方法都能持续地提高在几个图像分类数据集上的鲁棒性性能,与基线和一些最先进的方法相比,因此为未来研究提供了有希望的方向。