Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples. In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace. We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions. Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.
翻译:尽管进行了大量研究,但仍然不能很好地理解为什么受过训练的神经网络极易受到对抗性实例的影响。在这项工作中,我们侧重于使用低维线性子空间的数据培训的两层神经网络。我们表明,标准梯度方法导致非紫外神经网络,即在数据子空间方向或方向上具有较大梯度的网络,并且在这些方向上容易发生小型对抗性2美元的干扰。此外,我们表明,降低培训算法的初始化规模,或者增加2美元的正规化,可以使经过训练的网络更加强大,对数据进行对抗性干扰或对数据进行干涉。</s>