Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across the workers, such as stochastic gradients. Among the many techniques proposed to remedy this issue, one of the most successful is the framework of compressed communication with error feedback (EF). EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$. In this paper, we propose a new and theoretically and practically better alternative to EF for dealing with contractive compressors. In particular, we propose a construction which can transform any contractive compressor into an induced unbiased compressor. Following this transformation, existing methods able to work with unbiased compressors can be applied. We show that our approach leads to vast improvements over EF, including reduced memory requirements, better communication complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits thereof. We perform several numerical experiments which validate our theoretical findings.
翻译:现代大型机器学习应用程序要求用分布式计算系统实施随机优化算法。这种系统的一个关键瓶颈是工人之间交流信息的通信间接费用,例如随机梯度。在为解决这一问题而提出的许多技术中,最成功的是压缩通信和错误反馈的框架。EF仍然是唯一已知能够处理非公正的合同压缩机引起的错误的技术,如Top-K$。在本文中,我们提出了一个新的、理论上和实际上更好的替代EF处理合同压缩机的替代方法。特别是,我们提出了可以将任何合同压缩机转化为导导导致的无偏心压缩机的工程。在进行这种改造之后,可以采用与公正的压缩机一起工作的现有方法。我们表明,我们的方法可以大大改进EF,包括减少记忆要求、改进通信复杂性保证和减少假设。我们进一步扩展了我们的成果,在对节点的任意分发后,部分参与了进化学习,并展示了其中的好处。我们进行了数项数字实验,以证实我们的理论结论。