Multi-layer feedforward networks have been used to approximate a wide range of nonlinear functions. An important and fundamental problem is to understand the learnability of a network model through its statistical risk, or the expected prediction error on future data. To the best of our knowledge, the rate of convergence of neural networks shown by existing works is bounded by at most the order of $n^{-1/4}$ for a sample size of $n$. In this paper, we show that a class of variation-constrained neural networks, with arbitrary width, can achieve near-parametric rate $n^{-1/2+\delta}$ for an arbitrarily small positive constant $\delta$. It is equivalent to $n^{-1 +2\delta}$ under the mean squared error. This rate is also observed by numerical experiments. The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived. Our result also provides insight to the phenomena that deep neural networks do not easily suffer from overfitting when the number of neurons and learning parameters rapidly grow with $n$ or even surpass $n$. We also discuss the rate of convergence regarding other network parameters, including the input dimension, network layer, and coefficient norm.
翻译:多层向前进网络已被用来估计广泛的非线性功能。一个重要和根本的问题是了解网络模型通过其统计风险或未来数据的预期预测错误的可学习性。据我们所知,现有工程显示的神经网络汇合率最多受一个样本规模为$n美元左右的约1/4美元左右。在本文中,我们表明,一组变化限制的神经网络和任意宽度的任意宽度可以达到一个任意的微小正数常数$+1/2 ⁇ delta}的近参数率。这相当于平均平方差下的$1+2\delta}美元。这个速率也通过数字实验观察。结果显示,控制光滑功能所需的神经功能空间可能并不象人们经常看到的那样大。我们的结果还揭示了一种现象,即当神经和学习参数与美元或甚至超过美元标准值的网络输入率迅速增长时,深神经和学习参数的数量不会轻易地因过高而受到影响,包括以美元或甚至超过美元标准值的网络输入率。