In federated learning (FL), the communication constraint between the remote learners and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes ``\emph{${\bf M}$-magnitude weighted $L_{\bf 2}$ distortion + $\bf 2$ degrees of freedom''} (M22) algorithm, a rate-distortion inspired approach to gradient compression for federated training of deep neural networks (DNNs). In particular, we propose a family of distortion measures between the original gradient and the reconstruction we referred to as ``$M$-magnitude weighted $L_2$'' distortion, and we assume that gradient updates follow an i.i.d. distribution -- generalized normal or Weibull, which have two degrees of freedom. In both the distortion measure and the gradient, there is one free parameter for each that can be fitted as a function of the iteration number. Given a choice of gradient distribution and distortion measure, we design the quantizer minimizing the expected distortion in gradient reconstruction. To measure the gradient compression performance under a communication constraint, we define the \emph{per-bit accuracy} as the optimal improvement in accuracy that one bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distribution and distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor.
翻译:在联合学习(FL)中,远程学习者与Parameter服务器(PS)之间的沟通限制是一个关键的瓶颈。 为此原因, 模型更新必须压缩, 以便尽可能减少通信限制导致的准确性损失。 本文建议 : emph{ $xbf M} $- 放大加权 $*bf 2} 美元扭曲 + $\bf 2 美元 自由度 算法( M22), 一种为深神经网络( DNNS) 的精确性化培训, 一种由速率驱动的梯度压缩方法。 特别是, 我们建议了原始梯度与重建( 我们称之为 $$- m$ 加权 $_ $_ 2 $' ma} 。 我们假设, 梯度更新是i. i. d. = 普通正常 或 Weibball, 有两个自由度。 在扭曲度和梯度的算法中, 每一种自由度参数可以被设置为分级数的函数。 。 在选择梯度分布和扭曲度度测量中, 我们设计了一个扭曲度分布分布和扭曲的度测量度度度测量度度度度度度度度度度度度度度度, 我们设计了一个精确度度度度度度度度度度度度度度度度度度度度度度度度度度度度的精确度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度度