如何实现通信效率 DNN 培训? 转换、压缩、正确 (How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct)

In this paper, we introduce $\mathsf{CO}_3$, an algorithm for communication-efficiency federated Deep Neural Network (DNN) training.$\mathsf{CO}_3$ takes its name from three processing applied steps which reduce the communication load when transmitting the local gradients from the remote users to the Parameter Server.Namely:(i) gradient quantization through floating-point conversion, (ii) lossless compression of the quantized gradient, and (iii) quantization error correction.We carefully design each of the steps above so as to minimize the loss in the distributed DNN training when the communication overhead is fixed.In particular, in the design of steps (i) and (ii), we adopt the assumption that DNN gradients are distributed according to a generalized normal distribution.This assumption is validated numerically in the paper. For step (iii), we utilize an error feedback with memory decay mechanism to correct the quantization error introduced in step (i). We argue that this coefficient, similarly to the learning rate, can be optimally tuned to improve convergence. The performance of $\mathsf{CO}_3$ is validated through numerical simulations and is shown having better accuracy and improved stability at a reduced communication payload.

翻译：在本文中,我们引入了 $mathsf{CO ⁇ 3$, 通信效率联合深神经网络(DNN)培训的算法。 $\mathsf{CO ⁇ 3$, 其名称来自三个处理应用步骤, 这些步骤在从远端用户传输本地梯度到 Parameter 服务器时减少通信负荷。 Name 如下: (一) 通过浮点转换梯度梯度四分化, (二) 通过浮点转换, (二) 无损压缩四分梯度, (三) 量化错误更正。我们仔细设计了以上每个步骤, 以便在通信管理固定时, 将分布式 DNNN培训的损失降到最低。特别是, 在设计步骤 (一) 和 (二) 时, 我们采用了这样的假设, 即 DNNN梯度按照普通的正常分布方式分布。这一假设在文件中得到了数字化的验证。对于步骤(三) 我们使用记忆衰减机制的错误反馈来纠正在步骤(一) 中引入的四分差差错误。我们认为, 与学习率一样, 这一系数可以最佳地调整, 来改进趋一致。