We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, compressed communication, and partial participation. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Moreover, we observe that "1 + 1 + 1 is not 3": by mixing variance reduction of stochastic gradients with compressed communication and partial participation, we do not obtain a fully synergetic effect. We explain the nature of this phenomenon, argue that this is to be expected, and propose possible workarounds.
翻译:我们提出了一个新方法,其中包括分布式优化和联合学习的三个关键组成部分:减少随机梯度的差异、压缩通信和部分参与。我们证明新方法在部分参与环境中具有最佳的神器复杂性和最先进的通信复杂性。此外,我们注意到“1+1+1不是3”:通过将减少随机梯度的差异与压缩通信和部分参与相结合,我们没有获得充分协同效应。我们解释了这一现象的性质,认为这是预期的,并提出可能的变通办法。