In this paper, we aim at providing an introduction to the gradient descent based optimization algorithms for learning deep neural network models. Deep learning models involving multiple nonlinear projection layers are very challenging to train. Nowadays, most of the deep learning model training still relies on the back propagation algorithm actually. In back propagation, the model variables will be updated iteratively until convergence with gradient descent based optimization algorithms. Besides the conventional vanilla gradient descent algorithm, many gradient descent variants have also been proposed in recent years to improve the learning performance, including Momentum, Adagrad, Adam, Gadam, etc., which will all be introduced in this paper respectively.
 翻译:在本文中,我们的目标是介绍基于梯度下降优化算法,用于学习深层神经网络模型。涉及多个非线性投影层的深层学习模型非常难于培训。如今,大部分深层学习模型培训仍然依赖后传算法。在回传中,模型变量将反复更新,直到与基于梯度下降优化算法趋同为止。除了传统的香草梯度下降算法外,近年来还提出了许多梯度下降变法,以提高学习成绩,包括Momentum、Adagrad、Adam、Gadam等,这些都将分别纳入本文。