Deep neural networks (DNNs) are shown to be vulnerable to adversarial examples. A well-trained model can be easily attacked by adding small perturbations to the original data. One of the hypotheses of the existence of the adversarial examples is the off-manifold assumption: adversarial examples lie off the data manifold. However, recent research showed that on-manifold adversarial examples also exist. In this paper, we revisit the off-manifold assumption and want to study a question: at what level is the poor performance of neural networks against adversarial attacks due to on-manifold adversarial examples? Since the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial examples have greater attack rates than off-manifold adversarial examples on both standard-trained and adversarially-trained models. On synthetic datasets, theoretically, We prove that on-manifold adversarial examples are powerful, yet adversarial training focuses on off-manifold directions and ignores the on-manifold adversarial examples. Furthermore, we provide analysis to show that the properties derived theoretically can also be observed in practice. Our analysis suggests that on-manifold adversarial examples are important, and we should pay more attention to on-manifold adversarial examples for training robust models.
翻译:深心神经网络(DNNs) 被证明很容易受到对抗性实例的影响。 训练良好的模型很容易通过在原始数据中增加小扰动而攻击。 假设存在对抗性实例的假设之一是非自制假设:对抗性实例出自数据方方面面; 然而,最近的研究表明,在非自制的对抗性实例也存在。 在本文中,我们重新审视非自制假设,并想研究一个问题:在何种程度上神经网络在对抗性对抗性攻击方面表现欠佳? 由于真正的数据组合在实际中并不为人所知,因此我们认为真实和综合数据集中存在两种近似于自制的对抗性对抗性实例。在真实数据集方面,我们表明,在自制性对抗性对抗性例子比标准培训和对抗性对抗性训练模式中的非自制例子都具有更大的攻击率。 在合成数据集方面,理论上,我们证明自制的对抗性对抗性范例是强大的,但是在非自制方向和自制的对抗性训练中忽略了自制的两种自制的对抗性范例。 此外,我们还可以提供自制的自制性模型分析。