We study the geometry of global minima of the loss landscape of overparametrized neural networks. In most optimization problems, the loss function is convex, in which case we only have a global minima, or nonconvex, with a discrete number of global minima. In this paper, we prove that in the overparametrized regime, a shallow neural network can interpolate any data set, i.e. the loss function has a global minimum value equal to zero as long as the activation function is not a polynomial of small degree. Additionally, if such a global minimum exists, then the locus of global minima has infinitely many points. Furthermore, we give a characterization of the Hessian of the loss function evaluated at the global minima, and in the last section, we provide a practical probabilistic method of finding the interpolation point.
翻译:我们研究超参数神经网络损失景观的全局最小值的几何特性。在大多数优化问题中,损失函数是凸的,此时只有一个全局最小值,或者是非凸的,在该情况下有有限数量的全局最小值。 在本文中,我们证明了在超参数化的情况下,浅层神经网络可以插值任何数据集,即损失函数的全局最小值等于零,只要激活函数不是小次数的多项式。此外,如果存在这样的全局最小值,则全局最小值的轨迹具有无限多个点。此外,我们提供了一个解释最小二乘插值点的实用概率方法,并给出了该点的Hessian矩阵的特征。