This article is in the context of gradient compression. Gradient compression is a popular technique for mitigating the communication bottleneck observed when training large machine learning models in a distributed manner using gradient-based methods such as stochastic gradient descent. In this article, assuming a Gaussian distribution for the components in gradient, we find the rate distortion trade-off of gradient quantization schemes such as Scaled-sign and Top-K, and compare with the Shannon rate distortion limit. A similar comparison with vector quantizers also is presented.
翻译:这篇文章是在梯度压缩的背景下撰写的。 梯度压缩是一种减少通信瓶颈的流行技术,在使用梯度梯度梯度梯度下降等基于梯度的方法对大型机器学习模型进行分布式培训时观察到的。 在本条中,假设梯度成分的高斯分布,我们发现梯度梯度计法(如Scaled-sign和Top-K)的率扭曲取舍,并与香农率扭曲限值进行比较。 与矢量量量化器的类似比较也显示。