Variance-reduced stochastic gradient methods have gained popularity in recent times. Several variants exist with different strategies for the storing and sampling of gradients and this work concerns the interactions between these two aspects. We present a general proximal variance-reduced gradient method and analyze it under strong convexity assumptions. Special cases of the algorithm include SAGA, L-SVRG and their proximal variants. Our analysis sheds light on epoch-length selection and the need to balance the convergence of the iterates with how often gradients are stored. The analysis improves on other convergence rates found in the literature and produces a new and faster converging sampling strategy for SAGA. Problem instances for which the predicted rates are the same as the practical rates are presented together with problems based on real world data.
翻译:最近,差异减少的悬浮梯度方法越来越受欢迎。一些变式与不同的梯度储存和取样战略不同,这项工作涉及这两个方面的相互作用。我们提出了一个一般的准值差异减少梯度方法,并根据强烈的混凝土假设加以分析。算法的特殊情况包括SAGA、L-SVRG及其准值变方。我们的分析揭示了时间长度的选择,以及需要平衡循环率与储存梯度的频率之间的趋同。分析改进了文献中发现的其他趋同率,并为SAGA制定了新的更快的趋同采样战略。预测率与实际率相同的情况与基于真实世界数据的问题一起提出。