SGD和随机打乱如何? (How Good is SGD with Random Shuffling?)

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions. This setting has been investigated in several recent works, but the optimal error rates remain unclear. In this paper, we provide lower bounds on the expected optimization error with these heuristics (using SGD with any constant step size), which elucidate their advantages and disadvantages. In particular, we prove that after $k$ passes over $n$ individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $\Omega\left(1/(nk)^2+1/nk^3\right)$, which partially corresponds to recently derived upper bounds. Moreover, if the functions are only shuffled once, then the lower bound increases to $\Omega(1/nk^2)$. Since there are strictly smaller upper bounds for repeated reshuffling, this proves an inherent performance gap between SGD with single shuffling and repeated shuffling. As a more minor contribution, we also provide a non-asymptotic $\Omega(1/k^2)$ lower bound (independent of $n$) for the incremental gradient method, when no random shuffling takes place. Finally, we provide an indication that our lower bounds are tight, by proving matching upper bounds for univariate quadratic functions.

翻译：我们研究的是关于平滑和稳妥的定时和优化问题的随机梯度梯度下降(SGD)的性能。与大多数现有的理论工程相比, 这些理论工程认为, 单个函数被替换为样本, 我们在此集中关注受欢迎但不为人们所理解的超光层, 涉及对单个函数的随机变换。这个设置在最近的一些工程中已经调查过, 但最佳误差率仍然不清楚。在本文中, 我们提供了这些超常( 使用SGD 的固定步数) 的预期优化错误的下限。这些超常数说明了它们的利弊。特别是, 我们证明, 美元在随机值超过美元后, 如果每过一次重的函数被重新打碎, SGD 的最大可能的优化错误至少是$( Omegaleft (1/ (nk) 2+1/ nk) 3\\\\ right) $, 与最近得到的上限值部分对应。此外, 如果这些功能只是一次打乱, 更低的框度增加到 $\\\\\\\\\\\\\\\\\ x\ x\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\