Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called \emph{geometric step decay} and proceeds by halving the step size after every few epochs. In recent work, geometric step decay was shown to improve exponentially upon classical sublinear rates for the class of \emph{sharp} convex functions. In this work, we ask whether geometric step decay similarly improves stochastic algorithms for the class of sharp nonconvex problems. Such losses feature in modern statistical recovery problems and lead to a new challenge not present in the convex setting: the region of convergence is local, so one must bound the probability of escape. Our main result shows that for a large class of stochastic, sharp, nonsmooth, and nonconvex problems a geometric step decay schedule endows well-known algorithms with a local linear rate of convergence to global minimizers. This guarantee applies to the stochastic projected subgradient, proximal point, and prox-linear algorithms. As an application of our main result, we analyze two statistical recovery tasks---phase retrieval and blind deconvolution---and match the best known guarantees under Gaussian measurement models and establish new guarantees under heavy-tailed distributions.
翻译:沙粒( 子) 梯度 方法需要步数的调整, 才能在实际中很好地运行 。 经典的调试策略从多元角度衰减步数, 并导致( 强势) 康韦克斯 问题上的最佳亚线性比率 。 另一种在非康韦克斯 优化中流行的替代时间表称为 emph{ 几何步数衰减, 并在每个小时代后将步数的大小减半。 在最近的工作中, 几何步骤衰减显示在经典的亚线性比率上, 使 \ emph{ shapsrp} convex 函数级的典型次线性功能得到大幅改善 。 在此工作中, 我们询问几何步骤衰减是否同样能改善( 强势) 严重非康维x 问题类的亚线性算法 。 现代统计恢复问题中的这种损失特征, 导致在康韦克斯 设置的新挑战: 趋同区域是局部, 因此必须把逃逸的可能性捆绑。 我们的主要结果显示, 和不相直径直径直径直线性 亚 的亚 将 最接近值 的 亚 算值 最接近值 的 最接近结果 适用于 。