Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds and $\Theta(\varepsilon^{-1})$ samples are available for the initial update. Different from existing optimal methods, PStorm can still achieve a near-optimal complexity result $\tilde{O}(\varepsilon^{-3})$ by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.
翻译:沙变梯度方法(SGM)已被广泛用于解决沙变问题或大型机器学习问题。 最近的工作运用了各种技术来提高SGM对 convex 和非 convex 案例的趋同率。 其中多数需要大量样本,以部分或所有经过改进的SGM的迭代。 在本文中, 我们提议了一个新的SGM, 名为PStorm, 以解决非Convex非摩擦的沙变问题。 通过基于动力的减少差异技术, PStorm 能够实现最优化的复杂结果 $O (\ varepsilon) - 3} 美元, 以产生一个SgM( valepsilon) $ - squal- squality 解决方案。 如果平滑度条件存在, $ta(\\\ varepsluslationlational slationalal comm ) 样本可供初始更新。 Pstoral- comleareal train roduction a pal roupal rouplemental rouplemental rouple rouplemental rouplation rouplations, rouple a rodudududududududustrations, roduclemental a roduclemental a rouplemental b b b b b b b roducal a rouplemental a rotib a rotist roduclemental rouplemental rouplemental rouplemental a rotib a ro ro rotiblemental rotist rotist roclemental a ro ro ro rocal ro ro ro ro rod ro ro rocal rocal roclemental rob rocal rob rob rocal rob ro ro ro ro ro ro ro ro ro ro ro ro ro roup roup ro ro ro ro ro ro ro ro ro