In this paper, we theoretically prove that the deep ReLU neural networks do not lie in spurious local minima in the loss landscape under the Neural Tangent Kernel (NTK) regime, that is, in the gradient descent training dynamics of the deep ReLU neural networks whose parameters are initialized by a normal distribution in the limit as the widths of the hidden layers tend to infinity.
翻译:在本文中,我们理论上证明,深ReLU神经网络不在于在Neal Tangent Kernel(NTK)制度下,即深ReLU神经网络的梯度下降训练动态中,在损失地貌上虚假的当地微型神经网络,因为隐藏层的宽度往往无限,这些网络的参数是通过在极限的正常分布初始化的。