In this paper, we consider the distributed optimization problem where $n$ agents, each possessing a local cost function, collaboratively minimize the average of the local cost functions over a connected network. To solve the problem, we propose a distributed random reshuffling (D-RR) algorithm that combines the classical distributed gradient descent (DGD) method and Random Reshuffling (RR). We show that D-RR inherits the superiority of RR for both smooth strongly convex and smooth nonconvex objective functions. In particular, for smooth strongly convex objective functions, D-RR achieves $\mathcal{O}(1/T^2)$ rate of convergence (here, $T$ counts the total number of iterations) in terms of the squared distance between the iterate and the unique minimizer. When the objective function is assumed to be smooth nonconvex and has Lipschitz continuous component functions, we show that D-RR drives the squared norm of gradient to $0$ at a rate of $\mathcal{O}(1/T^{2/3})$. These convergence results match those of centralized RR (up to constant factors).
翻译:在本文中,我们考虑了分散式优化问题,即当每个拥有本地成本功能的代理商都拥有当地成本功能的零美元时,协作将连接网络的当地成本功能的平均值最小化。为了解决这个问题,我们建议采用分布式随机调整(D-RR)算法,将传统分布梯度下降法和随机调整(RR)相结合。我们表明,D-RR继承了RR的优越性,既可以顺畅地顺畅地顺流地顺流地顺流地顺流地顺流地顺流地顺流地顺流地顺流地顺畅通的不可凝固性目标功能。特别是,如果顺畅地顺畅地顺流目标功能,D-RR以$\mathcal{O}(1/T%2}3}的速率达到$(这里,$T$计迭代的总数)。当目标函数被假定为平坦式的非康韦思和利普西茨连续的组件功能时,我们显示,D-RRR将梯度标准按美元按$\math cal{O}(1/T_2})的速率计算,这些趋近率结果与中央因素相匹配。