In this paper, we propose a unified convergence analysis for a class of generic shuffling-type gradient methods for solving finite-sum optimization problems. Our analysis works with any sampling without replacement strategy and covers many known variants such as randomized reshuffling, deterministic or randomized single permutation, and cyclic and incremental gradient schemes. We focus on two different settings: strongly convex and nonconvex problems, but also discuss the non-strongly convex case. Our main contribution consists of new non-asymptotic and asymptotic convergence rates for a wide class of shuffling-type gradient methods in both nonconvex and convex settings. We also study uniformly randomized shuffling variants with different learning rates and model assumptions. While our rate in the nonconvex case is new and significantly improved over existing works under standard assumptions, the rate on the strongly convex one matches the existing best-known rates prior to this paper up to a constant factor without imposing a bounded gradient condition. Finally, we empirically illustrate our theoretical results via two numerical examples: nonconvex logistic regression and neural network training examples. As byproducts, our results suggest some appropriate choices for diminishing learning rates in certain shuffling variants.
翻译:在本文中,我们建议对一类解决有限和最佳效果问题的普通打拼型梯度方法进行统一的趋同分析。我们的分析涉及任何抽样,而没有替换战略,涵盖许多已知变体,例如随机调整、确定性或随机单变异、周期性和递增性梯度计划。我们侧重于两种不同的环境:强烈混和不凝固问题,但也讨论非强力混凝结问题。我们的主要贡献包括非混凝土和无阻式的新的累合率。我们的主要贡献包括非混凝土和混凝土环境中的大规模打拼式梯度方法的新的非抽调和无阻性趋同率。我们还研究标准化随机拼拼拼法变异变量,采用不同的学习率和模型假设。虽然我们在非混凝固情况中的比率是新的,而且大大改进了在标准假设下的现有工作,但高度混凝固的一号的费率与本文之前最著名的费率一致,达到一个不变的因素,而没有加固的梯度条件。最后,我们通过两个数字例子来实验性地说明我们的一些理论结果:非康化的后退变率率率和神经化网络的模型。