It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. Even if the data is linear separable, which means achieving low clean generalization error is easy, we can still prove an $\exp({\Omega}(d))$ lower bound for robust generalization. In general, our exponential lower bounds hold true for a variety of neural network families and other function classes as well, as long as their VC dimension is at most polynomial in the number of parameters. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.
翻译:众所周知,现代神经网络很容易受到对抗性实例的影响。 为了缓解这一问题, 已经提出了一系列强大的学习算法。 但是, 尽管强大的培训错误可能在某些方法中接近零, 但是所有现有的算法都会导致高度有力的概括错误。 在本文中, 我们从表达力的角度从深度神经网络的表达力的角度来从理论上理解这种令人费解的现象。 具体地说, 对于精密分离数据的二进制分类问题, 我们显示, 对于ReLU 网络来说, 虽然轻微的超分度足以保证高强的培训准确性, 但是, 除非在数据层面中神经网络的大小是指数性的, 否则, 强度的培训错误会一直存在恒度的。 即使数据是线性分解的, 这意味着实现低度的简单概括性错误, 我们仍然可以证明 $( mmegada}) 低度的概括性。 一般来说, 我们的指数性下限对于各种神经网络的组合和其他功能类别来说是真实的( 低度分量的直径直线值), 只要它们的VC值在最硬性网络的大小的大小上, 直径直径的分解值值值值, 我们就可以在总的轨道上实现一个直径直径直径直径直径的内值的值的值的值值的值数据。