统一和完善的非教会分散化学习一致分析 (A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning)

We study the consensus decentralized optimization problem where the objective function is the average of $n$ agents private non-convex cost functions; moreover, the agents can only communicate to their neighbors on a given network topology. The stochastic online setting is considered in this paper where each agent can only access a noisy estimate of its gradient. Many decentralized methods can solve such problems including EXTRA, Exact-Diffusion/D$^2$, and gradient-tracking. Unlike the famed $\small \text{DSGD}$ algorithm, these methods have been shown to be robust to the heterogeneity of the local cost functions. However, the established convergence rates for these methods indicate that their sensitivity to the network topology is worse than $\small \text{DSGD}$. Such theoretical results imply that these methods can perform much worse than $\small \text{DSGD}$ over sparse networks, which, however, contradicts empirical experiments where $\small \text{DSGD}$ is observed to be more sensitive to the network topology. In this work, we study a general stochastic unified decentralized algorithm ($\small\textbf{SUDA}$) that includes the above methods as special cases. We establish the convergence of $\small\textbf{SUDA}$ under both non-convex and the Polyak-Lojasiewicz condition settings. Our results provide improved network topology dependent bounds for these methods (such as Exact-Diffusion/D$^2$ and gradient-tracking) compared with existing literature. Moreover, our result shows that these method are less sensitive to the network topology compared to $\small \text{DSGD}$, which agrees with numerical experiments.

翻译：我们研究的是共识分散化优化问题, 目标函数是平均美元代理商的私人非康维克斯成本函数; 此外, 代理商只能用特定的网络表层向邻居传递。本文考虑了每个代理商只能访问对其梯度的响亮估计值的随机在线设置。许多分散化方法可以解决此类问题, 包括Extra、 Exact- dimple/ D% 2美元, 和渐渐变跟踪。与著名的 $\ smle {DSGD} 算法不同, 这些方法已证明对本地成本函数的异质性比较有力。然而, 这些方法的既定趋同率表明, 它们对于网络表层结构的敏感度比 $\ small\ text{ DSGD} 还要差。这些理论结果表明, 这些方法可以比 Extavironticle\ deal developmental developmental degression $( liversal- developtional) ex developtional develop ex developal develop ex develop ex develop ex develop ex develop ex destrations) a ex developments.

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日