We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among $n$ different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have \emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newton's method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing faster convergence and reduced communication relative to previous methods.
翻译:我们调查了如何利用快速和通信效率的算法来尽量减少在美元不同节点之间分配的强固和顺畅功能之和的典型问题,这些功能可以使用有限的位数进行交流。以前,这个问题的通信效率方法大多限于第一阶优化,因此对通信复杂程度的条件数的依赖性是 emph{linear}。我们表明这种依赖性并非内在的:通信效率方法实际上可以对条件数有亚线依赖性。为此,我们设计和分析通用线性模型和牛顿方法中第一个具有先决条件的、具有通信效率的梯度下降变量。我们的结果依赖于一种新技术,在控制其趋同率的同时,对预设物和每个步骤的下降方向进行量化。我们还实验性地验证了我们的调查结果,显示比以往方法更快的趋同和通信量减少。