Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the compression parameter, and applying additional controlled linear perturbations for Random Reshuffling, introduced by Malinovsky et al.(2021) helps to eliminate variance at the optimum. We provide the first analysis of compressed local methods under standard assumptions without bounded gradient assumptions and for heterogeneous data, overcoming the limitations of the compression operator. We corroborate our theoretical results with experiments on synthetic and real data sets.
翻译:随机调整(RR)是使用不替换取样的Stochistic Gradient Emple(SGD)的变体,它是一种极受欢迎的方法,通过将实验风险降到最低来培训受监督的机器学习模型,由于它的高级实际性能,它嵌入了标准机器学习软件,并经常被设定为默认。在FedRR(FedR)的名称下,这种方法最近被证明适用于联合学习(Mishchenko等人,2021),与当地SGD(SGD)等共同基线相比,它表现优异。受这一发展启发,我们设计了三种新的算法来进一步改进FDRR:压缩FedR(FDR)和两个差异减少的扩展:一个用于缓解来自打压的差异,另一个用于缓解因压缩而产生的差异。减压机制允许我们消除对压缩参数的依赖,并对随机调整应用额外的受控制的线性扰动器。我们用马里诺夫斯基等人(2021)引进了在最佳时有助于消除差异。我们提供了在标准梯度假设下对压缩本地方法的首次分析。我们用不附带的梯度假设和对压缩数据进行分解,以克服压缩合成数据的限制。我们用实际试验的理论结果。我们用数据。我们用。