AdaBelief, one of the current best optimizers, demonstrates superior generalization ability compared to the popular Adam algorithm by viewing the exponential moving average of observed gradients. AdaBelief is theoretically appealing in that it has a data-dependent $O(\sqrt{T})$ regret bound when objective functions are convex, where $T$ is a time horizon. It remains however an open problem whether the convergence rate can be further improved without sacrificing its generalization ability. %on how to exploit strong convexity to further improve the convergence rate of AdaBelief. To this end, we make a first attempt in this work and design a novel optimization algorithm called FastAdaBelief that aims to exploit its strong convexity in order to achieve an even faster convergence rate. In particular, by adjusting the step size that better considers strong convexity and prevents fluctuation, our proposed FastAdaBelief demonstrates excellent generalization ability as well as superior convergence. As an important theoretical contribution, we prove that FastAdaBelief attains a data-dependant $O(\log T)$ regret bound, which is substantially lower than AdaBelief. On the empirical side, we validate our theoretical analysis with extensive experiments in both scenarios of strong and non-strong convexity on three popular baseline models. Experimental results are very encouraging: FastAdaBelief converges the quickest in comparison to all mainstream algorithms while maintaining an excellent generalization ability, in cases of both strong or non-strong convexity. FastAdaBelief is thus posited as a new benchmark model for the research community.
翻译:Adabelief 是目前最佳优化的Adabelief 之一, 通过查看观察到的梯度指数移动平均值, 展示了与流行的Adam 算法相比的超强概括化能力。 Adabelief 理论上具有吸引力, 因为它拥有一个依赖于数据的速率值( O) (sqrt{T}) 。 当客观功能是Convex时, $T美元是一个时间范围, 而当客观功能是Convex时, 美元是AdaBelief 时, 却对是否在不牺牲其概括化能力的情况下, 可以进一步提高趋同率。% 关于如何利用强大的粘合能力来进一步提高Adabelief 的趋同率。 为此, 我们首次尝试了这项工作, 设计了一个叫FastAdaBelief 的快速优化算法算法( Fast AdaAda), 目的是利用它来更快的趋同率率率率率率, 因此, 在快速的实验中, 我们的理论推理学推论分析中, 是一种非常坚定的推理的推论。