Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of the behavior of discrete trajectories of first-order methods within the geometrical landscape of these functions. This paper concerns convergence of first-order discrete methods to a local minimum of nonconvex optimization problems that comprise strict saddle points within the geometrical landscape. To this end, it focuses on analysis of discrete gradient trajectories around saddle neighborhoods, derives sufficient conditions under which these trajectories can escape strict-saddle neighborhoods in linear time, explores the contractive and expansive dynamics of these trajectories in neighborhoods of strict-saddle points that are characterized by gradients of moderate magnitude, characterizes the non-curving nature of these trajectories, and highlights the inability of these trajectories to re-enter the neighborhoods around strict-saddle points after exiting them. Based on these insights and analyses, the paper then proposes a simple variant of the vanilla gradient descent algorithm, termed Curvature Conditioned Regularized Gradient Descent (CCRGD) algorithm, which utilizes a check for an initial boundary condition to ensure its trajectories can escape strict-saddle neighborhoods in linear time. Convergence analysis of the CCRGD algorithm, which includes its rate of convergence to a local minimum within a geometrical landscape that has a maximum number of strict-saddle points, is also presented in the paper. Numerical experiments are then provided on a test function as well as a low-rank matrix factorization problem to evaluate the efficacy of the proposed algorithm.
翻译:与梯度相关的一级方法已成为大规模数字优化问题的工作马匹。 其中许多问题涉及的是: 大规模数字优化问题。 这些问题中有许多是非对等的客观函数, 包括多个马鞍点, 这需要理解这些函数的几何景观内一级方法的离散轨迹。 本文涉及一级离散方法与当地最低非对等优化问题的趋同, 其中包括几何景观内的严格的马鞍点。 为此, 它侧重于分析马鞍周围的离散梯度梯度轨迹, 产生足够的条件让这些轨迹能够在线性时间里摆脱严格快速调整频度的周边, 从而需要了解这些功能中度位置周围的一阶梯度的离散轨轨轨轨迹。 本文涉及将一级离散方法与当地非对等优化问题相融合, 并着重指出这些轨迹轨迹无法在离开后重新进入附近区域。 根据这些洞察和分析, 纸质的轨迹轨迹轨迹在直径径直径径径径的周围, 纸质的轨函数会使用一个直径直径直的轨变法, 。