Classical global convergence results for first-order methods rely on uniform smoothness and the \L{}ojasiewicz inequality. Motivated by properties of objective functions that arise in machine learning, we propose a non-uniform refinement of these notions, leading to \emph{Non-uniform Smoothness} (NS) and \emph{Non-uniform \L{}ojasiewicz inequality} (N\L{}). The new definitions inspire new geometry-aware first-order methods that are able to converge to global optimality faster than the classical $\Omega(1/t^2)$ lower bounds. To illustrate the power of these geometry-aware methods and their corresponding non-uniform analysis, we consider two important problems in machine learning: policy gradient optimization in reinforcement learning (PG), and generalized linear model training in supervised learning (GLM). For PG, we find that normalizing the gradient ascent method can accelerate convergence to $O(e^{-t})$ while incurring less overhead than existing algorithms. For GLM, we show that geometry-aware normalized gradient descent can also achieve a linear convergence rate, which significantly improves the best known results. We additionally show that the proposed geometry-aware descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings.
翻译:经典的一阶方法全球趋同结果取决于统一的平滑度和=L ⁇ ojasiewicz 不平等性。在机器学习中产生的客观功能特性的驱动下,我们建议对这些概念进行不统一的完善,导致形成\emph{非单式平滑度}(NS)和\emph{非统一式的\L ⁇ ojasiewicz 不平等性}(N\L ⁇ )。新定义激励了新的几何-认知一阶方法,这些方法能够比经典的 $omega(1/t ⁇ 2) 低界限更快地接近全球最佳性。为了说明这些几何测量方法及其相应的非统一性分析的力量,我们考虑了机器学习的两个重要问题:强化学习的政策梯度优化(PG)和监管学习的一般线性模型培训(GLM)。关于PG,我们发现,将梯度法的常态化方法可以加速接近 $O(e ⁇ -t} 与现有的算法相比, 低调。对于GLM,我们所了解的正统性理论级结果也明显地平整化的梯度结果。