We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression. To obtain this improvement, we design MCM, an algorithm such that the downlink compression only impacts local models, while the global model is preserved. As a result, and contrary to previous works, the gradients on local servers are computed on perturbed models. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, MCM additionally combines model compression with a memory mechanism. This analysis opens new doors, e.g. incorporating worker dependent randomized-models and partial participation.
翻译:我们用一个中央服务器开发一种新的方法来解决分布式学习问题中的通信限制问题。 我们建议和分析一种新的算法,进行双向压缩,并实现与仅使用上链接(从当地工人到中央服务器)的算法相同的趋同率。 为了获得这一改进,我们设计了MC, 一种算法,使下链接压缩只影响当地模型,而全球模型则保留下来。结果,与以往的工程相反, 本地服务器上的梯度是用受扰动模型计算的。 因此, 趋同证据更具挑战性, 需要精确控制这种扰动。 为确保它, MCM 将模型压缩与记忆机制相结合。 这一分析打开了新的门, 比如, 将依赖工人的随机模型和部分参与结合起来。