In this paper, we consider distributed optimization problems where $n$ agents, each possessing a local cost function, collaboratively minimize the average of the local cost functions over a connected network. To solve the problem, we propose a distributed random reshuffling (D-RR) algorithm that invokes the random reshuffling (RR) update in each agent. We show that D-RR inherits favorable characteristics of RR for both smooth strongly convex and smooth nonconvex objective functions. In particular, for smooth strongly convex objective functions, D-RR achieves $\mathcal{O}(1/T^2)$ rate of convergence (where $T$ counts epoch number) in terms of the squared distance between the iterate and the global minimizer. When the objective function is assumed to be smooth nonconvex, we show that D-RR drives the squared norm of gradient to $0$ at a rate of $\mathcal{O}(1/T^{2/3})$. These convergence results match those of centralized RR (up to constant factors) and outperform the distributed stochastic gradient descent (DSGD) algorithm if we run a relatively large number of epochs. Finally, we conduct a set of numerical experiments to illustrate the efficiency of the proposed D-RR method on both strongly convex and nonconvex distributed optimization problems.
翻译:本文考虑分布式优化问题,其中$n$个代理人各自拥有一个本地费用函数,共同在一个连通网络上最小化本地费用函数的平均值。为了解决这个问题,我们提出了一种分布式随机重排(D-RR)算法,该算法在每个代理中调用随机重排(RR)更新。我们证明了在光滑强凸和光滑非凸目标函数方面,D-RR继承了RR的优良特性。特别地,对于光滑强凸的目标函数,D-RR实现了与全局最小化程序相比,$\mathcal{O}(1/T^2)$的收敛速度(其中$T$计算时代数)使得迭代与全局最小化程序之间的平方距离减小。当目标函数被假定为光滑非凸时,我们证明D-RR以$\mathcal{O}(1/T^{2/3})$的速度将梯度的平方范数推向$0$。这些收敛结果与集中式RR匹配(直到常量因子),并且在运行相对较多时超过分布式随机梯度下降(DSGD)算法。最后,我们进行了一组数值实验,以说明所提出的D-RR方法在强凸和非凸分布式优化问题上的高效性。