We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences that is reminiscent of but different from the strategy employed in the DIANA method of Mishchenko et al. (2019). Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance. The communication complexity bounds we prove for MARINA are evidently better than those of all previous first-order methods. Further, we develop and analyze two variants of MARINA: VR-MARINA and PP-MARINA. The first method is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for a partial participation of clients -- a feature important in federated learning. All our methods are superior to previous state-of-the-art methods in terms of oracle/communication complexity. Finally, we provide a convergence analysis of all methods for problems satisfying the Polyak-Lojasiewicz condition.
翻译:我们开发并分析MARINA:一种用于不同数据集的非混凝土分布式教学的新的通信高效方法。MARINA采用一种新的通信压缩战略,其基础是压缩梯度差异,这种差异与Mishchenko等人(2019年)的DIANA方法所采用的战略相仿,但不同于(2019年)Mishchenko等人(2019年)的DIANA方法。与几乎所有相互竞争的分布第一序列方法不同,我们使用的是精心设计的偏差梯度测算器,这是其高级理论和实践表现的关键。我们证明MARINA的通信复杂性显然比以往所有一级方法要好。此外,我们开发并分析了两种MARINA的变式:VR-MARINA和PP-MARINA。第一种方法是为客户拥有的当地损失功能是定额或预期形式,而第二种方法则允许客户部分参与,这是其美化学习的一个重要特征。我们的所有方法都优于以往的状态-最先进的方法。最后,我们提供了一种方法,用来解决多质/通信复杂性的所有方法。我们提供了解决问题的方法。