Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.
翻译:通常为特定输入特定模型而生成的Adversarial 示例,对于神经网络来说是无处不在的。在本文中,当将对抗性噪声组合在一起时,我们展示出一种惊人的对抗性噪声特性,即单步梯度方法制作的对抗性噪声如果配上相应的标签,就可线性地分离。我们理论上证明,对于具有随机初始化条目的双层网络和参数离初始化不远的神经相近内核设置而言,这种属性是无处不在的。证明的想法是,在保持线性分离的同时,标签信息可以有效地反向输入。我们的理论和实验证据进一步表明,受过训练的对抗性噪声训练的线性分类器可以很好地对测试数据的对抗性噪声进行分类,表明对抗性噪声实际上会给原始数据分布性扰动。此外,我们从经验上证明,当上述条件在比原始特征更容易分类时,对抗性噪声可能会变得不线性分离。