In the recent years, various gradient descent algorithms including the methods of gradient descent, gradient descent with momentum, adaptive gradient (AdaGrad), root-mean-square propagation (RMSProp) and adaptive moment estimation (Adam) have been applied to the parameter optimization of several deep learning models with higher accuracies or lower errors. These optimization algorithms may need to set the values of several hyperparameters which include a learning rate, momentum coefficients, etc. Furthermore, the convergence speed and solution accuracy may be influenced by the values of hyperparameters. Therefore, this study proposes an analytical framework to use mathematical models for analyzing the mean error of each objective function based on various gradient descent algorithms. Moreover, the suitable value of each hyperparameter could be determined by minimizing the mean error. The principles of hyperparameter value setting have been generalized based on analysis results for model optimization. The experimental results show that higher efficiency convergences and lower errors can be obtained by the proposed method.
翻译:近年来,各种梯度下降算法,包括梯度下降法、具有动力的梯度下降法、适应性梯度梯度(AdaGrad)、根平均值传播法(RMSProp)和适应性瞬间估计法(Adam),应用了各种梯度下降算法,以优化若干具有较高理解度或低误差的深层学习模型的参数;这些优化算法可能需要确定若干超参数的值,包括学习率、动力系数等。此外,趋同速度和解决办法的准确性可能受超参数值的影响。因此,本研究报告提出了一个分析框架,用数学模型分析基于各种梯度下降算法的每项目标函数的中值错误。此外,每个超光度计的适当值可以通过尽量减少平均误差来确定。根据模型优化分析结果,超光度值设定的原则已经普遍化。实验结果表明,通过拟议的方法可以实现更高的效率趋同和较低误差。