In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected "mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal $O(1/T^{1/3})$ rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
翻译:在这项工作中,我们调查的目标是对平稳损失功能的期望,而目标是找到一个大致的固定点。最受欢迎的处理这些问题的方法是减少差异技术,人们也知道这些技术可以获得紧凑的趋同率,与本案中较低的界限相匹配。然而,这些技术需要谨慎地与适当选定的“超大批量尺寸”一起维护锚点。这导致一个具有挑战性的超参数调问题,从而削弱其实用性。最近,[Cutkosky和Orabona, 2019] 已经表明,人们可以使用循环动力,以避免使用锚点和大批量尺寸,并仍然为这一设置获得最佳的速率。然而,它们称为StorM的方法关键地依赖于对平稳性的了解,以及受梯度规范的约束。在这项工作中,我们建议StorM+,这是一种完全没有参数的新方法,不需要大批量尺寸,并且获得一个最佳的 $O/T ⁇ 1/3] 的组合率,以找到一个具有新动动动动速度的动态速度。我们的工作在研究动力和新动动动动速度上建立了一个工作。