It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. This result holds even if the data is linear separable (which means achieving standard generalization is easy), and more generally for any parameterized function classes as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.
翻译:众所周知,现代神经网络容易受到对抗性实例的影响。 为了缓解这一问题,已经提出了一系列强有力的学习算法。 但是,尽管强力培训错误可能在某些方法中接近零,但所有现有算法都会导致高度稳健的一般化错误。 在本文中,我们从深度神经网络的表达力角度,从理论上理解这种令人费解的现象。 具体地说,对于数据分离数据的二进制分类问题,我们表明,对于ReLU网络来说,虽然轻微的过度参数化足以保证高稳健的培训准确性,但是,除非在数据维度中神经网络的规模是指数指数化的,否则,仍然存在着恒固的总体化差距。即使数据是线性相联的,(这意味着实现标准的概括化是容易的),而且对于任何参数级化的功能类别,只要其VC的维度在参数数量中最多的多,我们为网络的精度设定了一个更好的上限,而实际性强势度{O{(k)$(k)对于网络的精确度大小,我们从精确度的精确度模型中可以达到低硬化的准确性一般的精确度。 当我们不断扩展的数据在深度上增长数据的时候, 的深度的精确度上显示一个内层数据,则会降低的精确度的精确度的精确度的精确度值的精确度。