Escaping from saddle points and finding local minimum is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima. We show that LENA with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size shrinkage scheme to control the average movement of the iterates, which leads to faster convergence to the local minima.
翻译:从马鞍点跳出并找到本地最小值是非convex 优化的一个中心问题。 不稳定梯度方法也许是这一问题的最简单的方法。 但是, 要找到$( epsilon, \ sqrt ~ epsilon} ) 近似本地迷你, 现有的这种算法的最佳随机梯度复杂性是$( etilde O (\ epsilon) =- 3.5} 美元, 这并不是最理想的。 在本文中, 我们提议使用 leNA ( last stEp shryNkAge), 快速的 过敏梯度梯度框架( 或 $\ tilde O (\ epsilon) - 3} 来寻找本地迷你度梯度估计器, 例如 SASAH/ IDER 和 StorM 等, 现有最先进的梯度梯度梯度梯度梯度梯度梯度梯度梯度精度精度梯在 $ (\ silon, legn) rocal rodude sultlate_ slation_ a colver leglection) roflate_ slation_ sluplancelate rocal_ legal_ legy legal_ legal_ legaltaltaltal_ lection_ legal_ legal_ legal_ lection_ legal_ legaltalt_ lection_ lection_ lection_ legy_ legal_ legal_ legy_ le.