We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them. Unlike most established approaches, which rely on a static compressor choice (e.g., Top-$K$), our class allows the compressors to {\em evolve} throughout the training process, with the aim of improving the theoretical communication complexity and practical efficiency of the underlying methods. We show that our general approach can recover the recently proposed state-of-the-art error feedback mechanism EF21 (Richt\'arik et al., 2021) and its theoretical properties as a special case, but also leads to a number of new efficient methods. Notably, our approach allows us to improve upon the state of the art in the algorithmic and theoretical foundations of the {\em lazy aggregation} literature (Chen et al., 2018). As a by-product that may be of independent interest, we provide a new and fundamental link between the lazy aggregation and error feedback literature. A special feature of our work is that we do not require the compressors to be unbiased.
翻译:我们建议并研究一个新的梯度通信机制,用于通信效率培训 -- -- 3点压缩机(3PC) -- -- 以及高效分布的非碳化优化算法,这些算法可以利用这些算法。与大多数既定方法不同,这些方法依赖静态压缩机选择(例如,Top-$K$),我们的分类允许压缩机在整个培训过程中不断演进,目的是提高理论通信的复杂性和基本方法的实际效率。我们表明,我们的一般方法可以恢复最近提议的最先进的错误反馈机制EF21(Richt\'arik等人,2021年)及其理论特性,将其作为一个特殊案例,但也导致若干新的有效方法。值得注意的是,我们的方法使我们能够改进[Lem 懒惰汇总] 文学的算法和理论基础的艺术状况(Chen 等人,2018年)。作为可能具有独立兴趣的副产品,我们在懒惰汇总和错误反馈文献之间提供了一个新的和根本的联系。我们工作的特征是,我们的工作要求不要求分析师保持公正。