Decentralized optimization is effective to save communication in large-scale machine learning. Although numerous algorithms have been proposed with theoretical guarantees and empirical successes, the performance limits in decentralized optimization, especially the influence of network topology and its associated weight matrix on the optimal convergence rate, have not been fully understood. While (Lu and Sa, 2021) have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear. This paper revisits non-convex stochastic decentralized optimization and establishes an optimal convergence rate with general weight matrices. In addition, we also establish the optimal rate when non-convex loss functions further satisfy the Polyak-Lojasiewicz (PL) condition. Following existing lines of analysis in literature cannot achieve these results. Instead, we leverage the Ring-Lattice graph to admit general weight matrices while maintaining the optimal relation between the graph diameter and weight matrix connectivity. Lastly, we develop a new decentralized algorithm to nearly attain the above two optimal rates under additional mild conditions.
翻译:分散化优化对于在大型机器学习中节省通信是有效的。虽然提出了许多有理论保障和经验成功经验的算法,但分散化优化的绩效限制,特别是网络地形学及其相关重量矩阵对最佳趋同率的影响,并没有得到完全理解。(卢和萨,2021年)最近为非凝固型分散化优化提供了最佳比率,其重量矩阵比线性图界定的重量矩阵为非凝固型分散化优化率,但一般重量矩阵的最佳比率仍然不明确。本文重审非凝固型分散化优化,并确定了与一般重量矩阵的最佳趋同率。此外,我们还确定了非凝固型损失功能进一步满足Polica-Lojasiewicz(PL)条件的最佳比率。根据文献中现有的分析线无法取得这些结果。相反,我们利用环-拉蒂斯图来接受一般重量矩阵,同时保持图形直径和重矩阵连通的最佳关系。最后,我们开发了新的分散化新组合算法,在额外的温条件下接近达到上述两种最佳比率。