A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE). The Adaptive version of BFE has also been discussed thereafter. In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology. This improved approach provides a new perspective to scheduling the update of learning rate and will be compared with the stochastic gradient descent, aka SGD algorithm with momentum or Nesterov momentum and the most successful adaptive learning rate algorithm e.g. Adam. The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process. This approach combines the advantages of the first-order and second-order optimizations in the aspects of speed and efficiency.
翻译:最近提出了一项新的梯度优化办法,即自动安排学习率,称为“二进制前方探索”(BFE),此后还将讨论BFE的适应性版本,在本文件中,将调查在此基础上改进的算法,以优化新方法的效率和稳健性;这一改进办法为更新学习率的时间安排提供了新的视角,并将与随机梯度梯度下限、具有动力或Nesterov动力的 aka SGD 算法以及最成功的适应性学习率算法(例如Adam)进行比较。这一方法的目的不是要打败他人,而是提供一种不同的观点,以优化梯度下降过程。这一方法结合了第一阶和第二阶优化在速度和效率方面的优势。