Adaptive Moment Estimation (ADAM) is a very popular training algorithm for deep neural networks and belongs to the family of adaptive gradient descent optimizers. However to the best of the authors knowledge no complete convergence analysis exists for ADAM. The contribution of this paper is a method for the local convergence analysis in batch mode for a deterministic fixed training set, which gives necessary conditions for the hyperparameters of the ADAM algorithm. Due to the local nature of the arguments the objective function can be non-convex but must be at least twice continuously differentiable. Then we apply this procedure to other adaptive gradient descent algorithms and show for most of them local convergence with hyperparameter bounds.
翻译:适应性动态估计(ADAM)是深神经网络非常受欢迎的培训算法,属于适应性梯度下沉优化器家庭。然而,据作者所知,对于ADAM,并没有完全的趋同分析。本文的贡献是用批量方式对确定性固定培训集进行当地趋同分析的一种方法,为ADAM算法的超参数提供了必要条件。由于论据的局部性质,客观功能可以是非电解码,但必须至少是连续的两倍。然后,我们将这一程序应用到其他适应性梯度下沉算法中,并显示大部分地方与超光度界限的趋同。