We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods. Our approach modifies the objective function by adding a regularization term on the learning rate, and casts the joint updating process of parameters and learning rates into a maxmin problem. Given any regularization term, our approach facilitates the generation of practical algorithms. When \textit{Meta-Regularization} takes the $\varphi$-divergence as a regularizer, the resulting algorithms exhibit comparable theoretical convergence performance with other first-order gradient-based algorithms. Furthermore, we theoretically prove that some well-designed regularizers can improve the convergence performance under the strong-convexity condition of the objective function. Numerical experiments on benchmark problems demonstrate the effectiveness of algorithms derived from some common $\varphi$-divergence in full batch as well as online learning settings.
翻译:我们建议 \ textit{ Meta- Regulalization}, 这是一种在一阶梯度下降方法中适应性选择学习率的新办法。 我们的方法通过在学习率上增加一个正规化的术语来改变目标功能, 并将参数和学习率的联合更新进程推向最大问题。 鉴于任何正规化的术语, 我们的方法都有利于产生实用的算法。 当\ textit{ Meta- regalization} 将 $\ varphi$- digence 用作正规化器时, 由此产生的算法表现出与其他一阶梯度梯度基算法的相似的理论趋同性。 此外, 我们理论上证明, 一些设计完善的正规化者可以在客观函数的强稳健状态下改进趋同性表现。 基准问题的数值实验显示了从一些通用的 $ varphie- diverence 全批量和在线学习环境得出的算法的有效性 。