In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact {\em biased} compressors often show superior performance in practice when compared to the much more studied and understood {\em unbiased} compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $\mathcal{O}\left( \delta L \exp[-\frac{\mu K}{\delta L}] + \frac{(C + \delta D)}{K\mu}\right)$, where $\delta\ge1$ is a compression parameter which grows when more compression is applied, $L$ and $\mu$ are the smoothness and strong convexity constants, $C$ captures stochastic gradient noise ($C=0$ if full gradients are computed on each node) and $D$ captures the variance of the gradients at the optimum ($D=0$ for over-parameterized models). Further, via a theoretical study of several synthetic and empirical distributions of communicated gradients, we shed light on why and by how much biased compressors outperform their unbiased variants. Finally, we propose several new biased compressors with promising theoretical guarantees and practical performance.
翻译:在过去几年里,各种通信压缩技术已成为一个不可或缺的工具,有助于缓解分布式学习中的沟通瓶颈。然而,尽管事实上 ~em 偏向 } 压缩器在实践上往往表现出优异的绩效,而与更多研究和理解的 ~em不偏向 压缩器相比,它们鲜为人知。在这项工作中,我们研究了三种有偏向的压缩操作器,其中两种是新的,在应用到(Stochistic) 梯度下降和分布式(Stochistic) 梯度下降。我们第一次显示,偏向压缩压缩机可以在单节点和分布式的理论环境中导致线性趋同率。我们证明,与错误反馈机制相比,分布的压缩 SGD 方法在实践上往往表现优异, $mathalcal=lefleft (=delta L) +\ frac {(C+ D) + delta D$) 实际(delta D)\\\\\\\\\\\\\right right $。我们第一次显示, liverate $lightslightslightsal deal deal deal exal demoxlate ex ex ex ex exmlations) exmlations exal exal ex ex $,我们更arlupluplislational 和美元 $,我们如何在使用一个稳定值 $ 和美元 和美元 美元 。