Communication overhead is the key challenge for distributed training. Gradient compression is a widely used approach to reduce communication traffic. When combining with parallel communication mechanism method like pipeline, gradient compression technique can greatly alleviate the impact of communication overhead. However, there exists two problems of gradient compression technique to be solved. Firstly, gradient compression brings in extra computation cost, which will delay the next training iteration. Secondly, gradient compression usually leads to the decrease of convergence accuracy.
翻译:通信管理费是分布式培训的主要挑战。 渐变压缩是一种广泛使用的减少通信流量的方法。 当与管道等平行通信机制方法相结合时, 梯度压缩技术可以大大减轻通信管理费的影响。 但是, 有两个问题需要解决。 首先, 梯度压缩会带来额外的计算成本, 这将推迟下一次培训的循环。 其次, 梯度压缩通常导致趋同性下降。