Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call {\em step-size planning}, use the {\em update experience} to learn an improved way of updating the parameters. The methods organize the experience into $K$ steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, $1 - \sqrt{\mu/L}$, where $\mu, L$ are the strongly convex factor of the loss function $F$ and the Lipschitz constant of $F'$, which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a $10^{-3}$ accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.
翻译:加速速度的一个重要技术是一步级适应。 本文第一部分详细审查了一步级适应方法, 包括Polyak 步级、 L4 LostGrad、 Adam、 IDBD 和超梯级下降, 以及步级适应与元进化方法的关系。 在本文第二部分, 我们建议了一种与现有技术有某种区别的加速梯级下降的方法。 新方法, 我们称之为 一步级规划 }, 使用 $-3 更新 } 学习改进参数的方法。 方法将经验组织成美元跨步, 包括Polyak 步级、 L4 LostGrad、 Adam、 IDBD 和 超梯级下降 。 从以往的经验来看, 我们的规划算算算, Csawg, 是一种可以预测未来更新的多步式机器。 我们把Csawg 推广到 级级化计划 美元 级级级级( 美元) 级级级), 我们讨论和突出的预测能力, 基级- 基级 递升 递升 级 函数 的递升 级 动作 功能可以显示 10 级 级 级 级 级 级 级 级 级 级 级 级 级 升级 升级 升级 的 系统 升级 系统 升级 升级 系统 升级 升级 升级 升级 升级 升级 升级 升级 升级 的 升级 升级 升级 升级 升级 的 的 的 方法 升级 升级, 我们级 。