损失的梯度压缩:有多少精确度可以买到? (Lossy Gradient Compression: How Much Accuracy Can One Bit Buy?)

In federated learning (FL), a global model is trained at a Parameter Server (PS) by aggregating model updates obtained from multiple remote learners. Generally, the communication between the remote users and the PS is rate-limited, while the transmission from the PS to the remote users are unconstrained. The FL setting gives rise to the distributed learning scenario in which the updates from the remote learners have to be compressed so as to meet communication rate constraints in the uplink transmission toward the PS. For this problem, one wishes to compress the model updates so as to minimize the loss in accuracy resulting from the compression error. In this paper, we take a rate-distortion approach to address the compressor design problem for the distributed training of deep neural networks (DNNs). In particular, we define a measure of the compression performance under communication-rate constraints -- the \emph{per-bit accuracy} -- which addresses the ultimate improvement of accuracy that a bit of communication brings to the centralized model. In order to maximize the per-bit accuracy, we consider modeling the DNN gradient updates at remote learners as a generalized normal distribution. Under this assumption on the DNN gradient distribution, we propose a class of distortion measures to aid the design of quantizers for the compression of the model updates. We argue that this family of distortion measures, which we refer to as "$M$-magnitude weighted $L_2$" norm, captures the practitioner's intuition in the choice of gradient compressor. Numerical simulations are provided to validate the proposed approach for the CIFAR-10 dataset.

翻译：在联合学习(FL)中,一个全球模型在Parameter服务器(PS)接受培训,方法是汇总从多个远程学习者获得的模型更新。一般来说,远程用户和PS之间的通信是限速的,而PS向远程用户的传输则不受限制。FL设置产生了一种分布式的学习情景,在这种情景中,必须压缩来自远程学习者的最新信息,以便满足向PS传输上行传输的通信率限制。对于这个问题,人们希望压缩模型更新,以便尽可能减少压缩错误造成的准确性损失。在本文中,我们采用调速法处理远程学习者之间的通信设计问题,解决分布式的深度神经网络(DNNS)的压缩器设计问题。特别是,我们界定了在通信-率限制下对远程学习者更新的压缩性能的度度 -- \emph{per-bit 准确性, 以解决通信给中央模式带来的最终准确性改进。为了最大限度地实现每比值的准确性,我们考虑在远程学习者的DNNE更新中将D梯值更新作为一个普通正常分发的模型。我们提出用于Slimalalalal 的Sildalalalalalalal 的计算。在Serv 的计算中,我们提出用于Sildalisaldaldaldaldal的计算。我们提议了Servalismaldaldaldal的计算。我们提出用于Servaldaldaldal的模型的计算。我们提出一个模型的计算。