We introduce a framework - Artemis - to tackle the problem of learning in a distributed or federated setting with communication constraints and device partial Several workers (randomly sampled) perform the optimization process using a central server to aggregate their computations. To alleviate the communication cost, Artemis allows to compresses the information sent in both directions (from the workers to the server and conversely) combined with a memory It improves on existing algorithms that only consider unidirectional compression (to the server), or use very strong assumptions on the compression operator, and often do not take into account devices partial participation. We provide fast rates of convergence (linear up to a threshold) under weak assumptions on the stochastic gradients (noise's variance bounded only at optimal point) in non-i.i.d. setting, highlight the impact of memory for unidirectional and bidirectional compression, analyze Polyak-Ruppert averaging. We use convergence in distribution to obtain a lower bound of the asymptotic variance that highlights practical limits of compression. And we provide experimental results to demonstrate the validity of our analysis.
翻译:我们引入了一个框架——Artemis——以解决在分布式或联结式环境中学习的问题,并设置了通信限制和装置部分障碍。 几位工人(随机抽样)利用中央服务器进行优化过程,利用一个中央服务器进行计算。为了降低通信成本,Artemis允许压缩双向发送的信息(从工人到服务器和反向)以及记忆。它改进了仅考虑单向压缩(服务器)的现有算法,或对压缩操作员使用了非常强烈的假设,而且往往不考虑设备部分参与。我们在非i.i.i.d.设置的随机梯度(只在最佳点上限制噪音的差异)假设薄弱的情况下,提供了快速的趋同率(直线到阈值)。我们提供了实验结果,以证明我们分析的正确性。