Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result $O(\varepsilon^{-3})$ to produce a stochastic $\varepsilon$-stationary solution, if a mean-squared smoothness condition holds. Different from existing optimal methods, PStorm can achieve the ${O}(\varepsilon^{-3})$ result by using only one or $O(1)$ samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or $O(1)$ new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.
翻译:沙变梯度方法(SGM)已被广泛用于解决沙变问题或大型机器学习问题。最近的工作运用了各种技术来提高SGM在 convex 和非convex 情况下的趋同率。其中多数需要大量样本,以部分或所有经过改进的SGM的迭代。在本文件中,我们提出了一个新的SGM,名为PStorm,以解决非convex非摩擦的沙变问题。采用基于动力的减少变异技术,PStorm可以实现最优化的复杂结果$O(\ varepsilon ⁇ -3}($O),以产生一个具有中度的光滑动状态的 SGM 。与现有的最佳方法不同,PStorm 只能通过在每次更新中只使用一美元或一美元($)的样本来达到美元的非CSmrealmal-real transmissional commessional commessional rouplemental roupal roupal roupal roupal roupal roupal rouple roupal rouple rouple rouple rouple rouple messs,或一或一或一或一美元能展示其他最小型培训问题,可以应用新添加一个或更小的大规模培训方法来演示其他方法。