Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. The source code will be made publicly available at: https://github.com/mhaut/AngularGrad.
翻译:最近,适应性瞬时估计(Adam)优化由于适应性势头而变得非常流行,因为适应性瞬间估计(Adam)优化由于适应性动力,解决了SGD临终的梯度问题。然而,现有的优化者仍然无法有效利用优化曲线曲线信息。本文提议一个新的角格优化器,以考虑连续梯度方向/角的动作。这是文献中首次尝试利用梯度三角信息,但其大小除外。拟议的角格拉德根据先前迭代的梯度三角信息生成一个分以控制步数大小。因此,优化步骤随着刚过渐渐渐渐渐变的更精确步数而变得更为平滑。根据Tangent或Cosine函数计算梯度三角信息开发了两种变式。理论性,角格格拉德展示了与Adam相同的遗憾。然而,在基准数据组上对州/角信息组的梯度三角信息组进行了广泛的实验:Aglas/Agrad 的高级数据组/Agradrod 源显示的高级性。