Today, various forms of neural networks are trained to perform approximation tasks in many fields. However, the estimates obtained are not fully understood on function space. Empirical results suggest that typical training algorithms favor regularized solutions. These observations motivate us to analyze properties of the neural networks found by gradient descent initialized close to zero, that is frequently employed to perform the training task. As a starting point, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we rigorously show that for such networks ridge regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals. For least squares regression, we show that the trained network converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. Moreover, we derive a correspondence between the early stopped gradient descent and the smoothing spline regression. Our analysis might give valuable insight on the properties of the solutions obtained using gradient descent methods in general settings.
翻译:今天,对各种形式的神经网络进行了培训,以便在许多领域执行近似任务。 但是,获得的估计数在功能空间上并不完全理解。 经验性结果显示,典型的培训算法倾向于常规化解决方案。 这些观察促使我们分析通过梯度下降初始化接近零而发现神经网络的特性, 通常用于执行培训任务。 作为起点, 我们考虑一个维( shallow) ReLU 神经网络, 其重量是随机选择的, 只有终端层才受过培训。 首先, 我们严格地表明, 对于这些网络, 峰值正规化回归在功能空间中对应了将估计数的第二个衍生物正规化为相当普通的损失函数。 对于最不平方形的回归, 我们显示, 训练过的网络与培训数据的平滑的螺纹线间插接合, 以隐藏节点的数为无限。 此外, 我们从早期停止的梯度下降和平滑滑滑滑的螺旋回归之间得出了对应的对应信息。 我们的分析可能会对一般情况下使用梯度梯度下降方法获得的解决方案的特性提供宝贵的洞察力。