Tuning optimizer hyperparameters, notably the learning rate to a particular optimization instance, is an important but nonconvex problem. Therefore iterative optimization methods such as hypergradient descent lack global optimality guarantees in general. We propose an online nonstochastic control methodology for mathematical optimization. The choice of hyperparameters for gradient based methods, including the learning rate, momentum parameter and preconditioner, is described as feedback control. The optimal solution to this control problem is shown to encompass preconditioned adaptive gradient methods with varying acceleration and momentum parameters. Although the optimal control problem by itself is nonconvex, we show how recent methods from online nonstochastic control based on convex relaxation can be applied to compete with the best offline solution. This guarantees that in episodic optimization, we converge to the best optimization method in hindsight.
翻译:最优化的超参数, 特别是向特定优化实例的学习率, 是一个重要但非冷却的问题。 因此, 高梯度下降等迭代优化方法一般缺乏全球最佳性保障。 我们提议了数学优化的在线非随机控制方法。 为基于梯度的方法选择超参数, 包括学习率、 动力参数 和先决条件, 被描述为反馈控制 。 这个控制问题的最佳解决方案显示包含具有各种加速度和动力参数的先期适应性梯度方法。 虽然最佳控制问题本身是非电流, 但是我们展示了如何应用基于松动的在线非随机控制的最新方法来与最佳离线解决方案竞争。 这保证了在缩影优化中, 我们与后视中的最佳优化方法趋同。