We introduce a framework - Artemis - to tackle the problem of learning in a distributed or federated setting with communication constraints and device partial participation. Several workers (randomly sampled) perform the optimization process using a central server to aggregate their computations. To alleviate the communication cost, Artemis allows to compress the information sent in both directions (from the workers to the server and conversely) combined with a memory mechanism. It improves on existing algorithms that only consider unidirectional compression (to the server), or use very strong assumptions on the compression operator, and often do not take into account devices partial participation. We provide fast rates of convergence (linear up to a threshold) under weak assumptions on the stochastic gradients (noise's variance bounded only at optimal point) in non-i.i.d. setting, highlight the impact of memory for unidirectional and bidirectional compression, analyze Polyak-Ruppert averaging. We use convergence in distribution to obtain a lower bound of the asymptotic variance that highlights practical limits of compression. We propose two approaches to tackle the challenging case of devices partial participation and provide experimental results to demonstrate the validity of our analysis.
翻译:我们引入了一个框架——Artemis——以解决在分布式或联结式环境中学习的问题,这种框架含有通信限制和部分参与装置。一些工人(随机抽样)使用中央服务器进行优化过程,以汇总计算。为了降低通信成本,Artemis允许压缩双向(工人到服务器和反向)发送的信息,同时结合一个记忆机制。它改进了仅考虑单向压缩(服务器)或对压缩操作员使用非常强烈的假设,而且往往不考虑设备部分参与。我们提供了快速的趋同率(直线到临界点),在非i.i.d.设置的随机梯度(噪音的偏差只在最佳点)的假设下,我们提供了快速的趋同率(直线到临界点)。我们建议采取两种办法,解决具有挑战性的单向和双向压缩的记忆的影响,分析Polyak-Ruppert的平均值。我们利用分布式趋同方法获得较低范围、强调实际压缩限度的反向差异。我们提出了两种办法,以便证明我们部分参与的实验性分析结果的有效性。