It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension $d$. Even if the data is linear separable, which means achieving low clean generalization error is easy, we can still prove an $\exp({\Omega}(d))$ lower bound for robust generalization. Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a lower bound that grows exponentially with respect to $k$ -- the curse of dimensionality is inevitable. By demonstrating an exponential separation between the network size for achieving low robust training and generalization error, our results reveal that the hardness of robust generalization may stem from the expressive power of practical models.
翻译:众所周知,现代神经网络容易受到对抗性实例的影响。 为了缓解这一问题,已经提出了一系列强有力的学习算法。 但是,尽管强力培训错误可能在某些方法中接近零,但所有现有算法都会导致高度稳健的一般化错误。 在本文中,我们从深度神经网络的表达力角度从理论上理解这种令人费解的现象。 具体地说,对于以良好分离的数据构成的二进制分类问题,我们表明,对于ReLU网络来说,虽然轻微的超度参数化足以保证高稳健的培训准确性,但始终存在着强健的通用差距,除非在数据层面神经网络的规模是指数性的指数化。即使数据是线性分解的,这意味着实现低清晰的概括性错误很容易,但我们仍然可以证明美元(@Omega}(d)) 美元对于稳健的概括化,我们为网络规模设定了一个更好的上限值为$(mathcal cal), 当数据在直径直径直径直径直径直径直径直径直径直的网络上,我们也可以以直径直径直径直径直的深度显示一个直径直径直径直径直径直径直的模型。