Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural networks but not necessarily their practical performance. In contrast, optimization theory for neural networks heavily relies on an abundance of weights in over-parametrized regimes. This paper balances these two demands and provides an approximation result for shallow networks in $1d$ with non-convex weight optimization by gradient descent. We consider finite width networks and infinite sample limits, which is the typical setup in approximation theory. Technically, this problem is not over-parametrized, however, some form of redundancy reappears as a loss in approximation rate compared to best possible rates.
翻译:最近文献中广泛研究的神经网络的两个方面是其功能近似特性和以梯度下降法培训的神经网络的功能近似特性。近似问题寻求精确近似值,其加权数最少。在目前大多数文献中,这些加权数完全或部分是手工制作的,表明神经网络的能力,但不一定是实际性能。相比之下,神经网络的最佳理论严重依赖过度平衡制度中大量重量的理论。本文平衡了这两个要求,并为浅端网络提供了近似结果,以1美元计算,以梯度下降法优化非碳氧化物重量。我们考虑了有限宽度网络和无限抽样限,这是近似理论中典型的设置。在技术上,这个问题并没有过分地被过分量化,但是,与最佳速率相比,某些形式的冗余重现率与最佳速率相比并没有出现过度的亏损。