The starting point of this paper is the discovery of a novel and simple error-feedback mechanism, which we call EF21-P, for dealing with the error introduced by a contractive compressor. Unlike all prior works on error feedback, where compression and correction operate in the dual space of gradients, our mechanism operates in the primal space of models. While we believe that EF21-P may be of interest in many situations where it is often advantageous to perform model perturbation prior to the computation of the gradient (e.g., randomized smoothing and generalization), in this work we focus our attention on its use as a key building block in the design of communication-efficient distributed optimization methods supporting bidirectional compression. In particular, we employ EF21-P as the mechanism for compressing and subsequently error-correcting the model broadcast by the server to the workers. By combining EF21-P with suitable methods performing worker-to-server compression, we obtain novel methods supporting bidirectional compression and enjoying new state-of-the-art theoretical communication complexity for convex and nonconvex problems. For example, our bounds are the first that manage to decouple the variance/error coming from the workers-to-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. In the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is non-standard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.
翻译:本文的出发点是发现一种新颖的简单错误反馈机制,我们称之为EF21-P,用于处理合同压缩机引入的错误。与以往所有关于错误反馈的工作不同,我们的机制在双梯度的双空运行压缩和校正,而在模型的原始空间运行。虽然我们认为EF21-P在许多情况下可能具有兴趣,在计算梯度之前进行模型扰动(例如,随机的递增平滑和概括化)往往有好处,在这项工作中,我们把注意力集中在它作为设计支持双向压缩的通信高效分布优化方法的关键构件上。特别是,我们使用EF21-P作为对服务器向工人广播的模型进行压缩并随后进行错误校正的机制。通过将EF21-P与工人向服务器压缩的合适方法相结合,我们获得了支持双向的递增缩缩压和享受新状态的理论通信复杂性的方法。在计算递增和非粉色调的计算技术中,我们开始的理论化, 也就是将我们从最终的变压到变压的系统。例如, 我们的机型的变压到一个系统,我们从一个变压到一个变压的变压的机的变压的机的系统, 。