Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization. We propose two new algorithms: Proximal and Federated Random Reshuffing (ProxRR and FedRR). The first algorithm, ProxRR, solves composite convex finite-sum minimization problems in which the objective is the sum of a (potentially non-smooth) convex regularizer and an average of $n$ smooth objectives. We obtain the second algorithm, FedRR, as a special case of ProxRR applied to a reformulation of distributed problems with either homogeneous or heterogeneous data. We study the algorithms' convergence properties with constant and decreasing stepsizes, and show that they have considerable advantages over Proximal and Local SGD. In particular, our methods have superior complexities and ProxRR evaluates the proximal operator once per epoch only. When the proximal operator is expensive to compute, this small difference makes ProxRR up to $n$ times faster than algorithms that evaluate the proximal operator in every iteration. We give examples of practical optimization tasks where the proximal operator is difficult to compute and ProxRR has a clear advantage. Finally, we corroborate our results with experiments on real data sets.
翻译:随机重整(RR),又称“不替换的慢速渐变源(SGD)”,是一种流行的、理论上基于理论的限定总和最小化方法。我们提出了两种新的算法:先质和联质随机重整(ProxRR和FedRRR)。第一个算法(ProxRR),解决了复合共解(convex)有限总和最小化问题,其目标是一个(可能非摩特的)顺流调节器和平均平滑目标的总和。我们得到了第二个算法(FedRR),作为ProxR在重新处理分布的问题时应用的一种特殊案例,要么是同质数据,要么是混杂数据。我们用恒定的和递减步骤来研究算法的趋同特性,表明它们比ProxRR(Prox)和本地 SGD(SGD)具有相当大的优势。特别是,我们的方法比较复杂,ProxRR(prox)只对准操作者作一次评估,这样小的差别使得ProxRRR(ProxR)的计算速度比评估前期的精确度优势要快得多。我们用最难的实验者最后都以证实。