In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak-{\L}ojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-based variants converge faster than their with-replacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is tight. Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
翻译:在分布式学习中,当地SGD(又称联合平均)及其简单的基线微型组合SGD(也称为联盟平均)是广泛研究的优化方法。这些方法的大多数现有分析假设了通过替换抽样获得的独立和不偏倚的梯度估计值。相反,我们研究的是基于打乱的变体:微型组合和本地随机调整,这种变体在不替换的情况下产生随机梯度,因而更接近实践。为了顺利地满足Polyak-L}ojasiewicz条件的功能,我们获得了汇合线(在大时代制度中),这表明这些打乱的变体比替换变体的组合速度更快。此外,我们证明我们匹配了较低的范围,表明我们的汇合分析很紧张。最后,我们提议了一种叫作同步折叠动的算法修改法,它导致汇合率比我们近交汇环境的下界速度要快。