使用快速速率和压缩通信的第二顺序分配方法 (Distributed Second Order Methods with Fast Rates and Compressed Communication)

We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity.

翻译：我们开发了几种新的通信效率第二阶方法来进行分配优化。我们的第一种方法, Newton- STAR, 是牛顿方法的变种, 它从中继承其快速的本地四象速率。然而, 与牛顿方法不同, Newton-STAR在循环通信成本方面享有与梯度下降相同的同样的权利。虽然这种方法不切实际, 因为它依赖于使用某些未知的参数来描述目标函数的最佳程度, 但这种方法可以作为起点, 使我们能够设计实用的变种, 并有很强的理论保证。特别是, 我们设计了一种随机化战略, 以便以迭代方式学习未知的参数。将这一战略应用到牛顿- STAR, 导致我们下一个方法, 即 Newton- LEARN, 以坡度下降速度, 我们证明是本地线性和超线率, 以及全球水平的实验性水平, 证明了全球水平的直线和超标度, 证明了全球水平和实验性水平的精确率, 我们的精确和实验性水平的精确率, 证明了全球水平的精确率和实验性水平的精确率, 水平的精确率。