Gradient descent optimization algorithms are the standard ingredients that are used to train artificial neural networks (ANNs). Even though a huge number of numerical simulations indicate that gradient descent optimization methods do indeed convergence in the training of ANNs, until today there is no rigorous theoretical analysis which proves (or disproves) this conjecture. In particular, even in the case of the most basic variant of gradient descent optimization algorithms, the plain vanilla gradient descent method, it remains an open problem to prove or disprove the conjecture that gradient descent converges in the training of ANNs. In this article we solve this problem in the special situation where the target function under consideration is a constant function. More specifically, in the case of constant target functions we prove in the training of rectified fully-connected feedforward ANNs with one-hidden layer that the risk function of the gradient descent method does indeed converge to zero. Our mathematical analysis strongly exploits the property that the rectifier function is the activation function used in the considered ANNs. A key contribution of this work is to explicitly specify a Lyapunov function for the gradient flow system of the ANN parameters. This Lyapunov function is the central tool in our convergence proof of the gradient descent method.
翻译:渐渐下降优化算法是用于培训人工神经网络的标准要素。尽管大量数字模拟表明,梯度下降优化方法确实在培训非本国人员的过程中的确趋于一致,但直到今天,还没有严格的理论分析证明这种推测。特别是,即使梯度下降优化算法的最基本变数,即普通香草梯度下降法,证明或否定在培训非本国人员时梯度下降集合的假设,这仍然是一个未解决的问题。在本条中,我们解决了这一问题,因为考虑的目标函数是一个不变的功能。更具体地说,在不断的目标函数中,我们证明,在以一重的层对完全相连的种子向非本国人员进行的培训中,梯度下降方法的风险功能确实趋同于零。我们的数学分析有力地利用了一种属性,即调校正功能是考虑的ANNS中所使用的激活功能。这项工作的一项关键贡献是,明确指定了我们Lyapunov 渐渐变工具的Lyapunov Clegregal 系统。