Large-scale distributed training is increasingly becoming communication bound. Many gradient compression algorithms have been proposed to reduce the communication overhead and improve scalability. However, it has been observed that in some cases gradient compression may even harm the performance of distributed training. In this paper, we propose MergeComp, a compression scheduler to optimize the scalability of communication-efficient distributed training. It automatically schedules the compression operations to optimize the performance of compression algorithms without the knowledge of model architectures or system parameters. We have applied MergeComp to nine popular compression algorithms. Our evaluations show that MergeComp can improve the performance of compression algorithms by up to 3.83x without losing accuracy. It can even achieve a scaling factor of distributed training up to 99% over high-speed networks.
翻译:许多梯度压缩算法已被提出来减少通信管理费用并改进可缩放性。然而,据观察,在某些情况下,梯度压缩甚至可能损害分布式培训的绩效。在本文中,我们提议合并计时器,即压缩计时器,以优化通信效率分布式培训的可缩放性。它自动安排压缩操作,以优化压缩算法的性能,而不了解模型结构或系统参数。我们已经将合并计算法应用于9个受欢迎的压缩算法。我们的评估显示,合并计能将压缩算法的性能提高至3.83x,而不会降低准确性。它甚至可以在高速网络上达到99%的分布培训比例。