We consider a standard distributed optimisation setting where $N$ machines, each holding a $d$-dimensional function $f_i$, aim to jointly minimise the sum of the functions $\sum_{i = 1}^N f_i (x)$. This problem arises naturally in large-scale distributed optimisation, where a standard solution is to apply variants of (stochastic) gradient descent. We focus on the communication complexity of this problem: our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines to solve this problem under point-to-point communication, within a given error-tolerance. Specifically, we show that $\Omega( Nd \log d / N\varepsilon)$ total bits need to be communicated between the machines to find an additive $\epsilon$-approximation to the minimum of $\sum_{i = 1}^N f_i (x)$. The result holds for both deterministic and randomised algorithms, and, importantly, requires no assumptions on the algorithm structure. The lower bound is tight under certain restrictions on parameter values, and is matched within constant factors for quadratic objectives by a new variant of quantised gradient descent, which we describe and analyse. Our results bring over tools from communication complexity to distributed optimisation, which has potential for further applications.
翻译:我们考虑一个标准分布式最佳化设置,在这种设置中,每台持有美元维维功能的机器,每台持有美元维维功能的机器,目的是在一定的错误容忍度范围内,共同将功能的总和减到最小,这是大规模分布式优化中自然产生的问题,在大规模分布式优化中,一个标准解决方案是应用(随机)梯度下降的变异物。我们侧重于这一问题的通信复杂性:我们的主要结果提供了第一个完全无条件的比特数限制,这些比特数需要由每部机器发送和收到,以便在点对点通信中,在特定错误的容忍下,解决该问题。具体地说,我们显示,每部(Nd\log d/N\varepsilon)美元的总比特需要在机器之间进行沟通,以便找到一个添加值 $epsilon$-approcolm, 以最小的 = 1 ⁇ n f_i (xx)美元。结果将确定性和随机化的算法值保留在确定性和随机化的通信中,而且重要的是,对于某种变位值的比值没有固定的参数上,在新的参数结构中,因此,我们不需要对某种变式分析结果进行精确的比。