This article presents a new criterion for convergence of gradient descent to a global minimum. The criterion is used to show that gradient descent with proper initialization converges to a global minimum when training any feedforward neural network with smooth and strictly increasing activation functions, provided that the input dimension is greater than or equal to the number of data points. The main difference with prior work is that the width of the network can be a fixed number instead of unrealistically growing as some multiple or power of the number of data points.
翻译:本条提出了梯度下降与全球最低值趋同的新标准,该标准用来表明,在培训具有平稳和严格增加激活功能的任何进料前神经网络时,只要输入层面大于或等于数据点数,经过适当初始化的梯度下降可与全球最低值趋同。与以往工作的主要区别是,网络的宽度可以是固定数字,而不是不切实际地随着数据点数的某些多重或功率增长而增长。