DNN 梯度无损压缩: GenNorm 能回答吗? (DNN gradient lossless compression: Can GenNorm be the answer?)

In this paper, the problem of optimal gradient lossless compression in Deep Neural Network (DNN) training is considered. Gradient compression is relevant in many distributed DNN training scenarios, including the recently popular federated learning (FL) scenario in which each remote users are connected to the parameter server (PS) through a noiseless but rate limited channel. In distributed DNN training, if the underlying gradient distribution is available, classical lossless compression approaches can be used to reduce the number of bits required for communicating the gradient entries. Mean field analysis has suggested that gradient updates can be considered as independent random variables, while Laplace approximation can be used to argue that gradient has a distribution approximating the normal (Norm) distribution in some regimes. In this paper we argue that, for some networks of practical interest, the gradient entries can be well modelled as having a generalized normal (GenNorm) distribution. We provide numerical evaluations to validate that the hypothesis GenNorm modelling provides a more accurate prediction of the DNN gradient tail distribution. Additionally, this modeling choice provides concrete improvement in terms of lossless compression of the gradients when applying classical fix-to-variable lossless coding algorithms, such as Huffman coding, to the quantized gradient updates. This latter results indeed provides an effective compression strategy with low memory and computational complexity that has great practical relevance in distributed DNN training scenarios.

翻译：本文考虑了深神经网络(DNN) 培训中最佳梯度无损压缩问题。渐变压缩在许多分布式 DNN 培训情景中具有相关性, 包括最近流行的离子学习( FL) 情景, 每个远程用户通过无噪音但速率有限的频道与参数服务器连接。在分布式 DNN 培训中, 如果具备基本的梯度分布, 可以使用经典无损压缩方法来减少传送梯度条目所需的比特数。中场分析表明, 梯度更新可以被视为独立的随机变量, 而 Laplace 近似可以用来论证梯度的分布近似于某些制度正常( Norm) 分布。在本文中, 对于一些有实际兴趣的网络, 梯度条目可以模拟为通用的正常( GenNorm) 分布式分布。我们提供数字评估, 以验证假设 GenNorm 模型能更准确地预测 DNN NN 梯度尾分布。此外, 这一模型选择可以具体地改进梯度的不损失的缩缩缩缩缩缩度, 当应用古式修正修正的硬性修正后级修正后级的硬度计算策略时, 提供高度的缩缩缩缩缩缩缩缩化后算法, 提供这样的软化的软缩缩缩缩缩缩式的计算法, 提供这样的软化的缩略式的计算,, 提供这样的缩略式的计算。