Decentralized nonconvex optimization has received increasing attention in recent years in machine learning due to its advantages in system robustness, data privacy, and implementation simplicity. However, three fundamental challenges in designing decentralized optimization algorithms are how to reduce their sample, communication, and memory complexities. In this paper, we propose a \underline{g}radient-\underline{t}racking-based \underline{sto}chastic \underline{r}ecursive \underline{m}omentum (GT-STORM) algorithm for efficiently solving nonconvex optimization problems. We show that to reach an $\epsilon^2$-stationary solution, the total number of sample evaluations of our algorithm is $\tilde{O}(m^{1/2}\epsilon^{-3})$ and the number of communication rounds is $\tilde{O}(m^{-1/2}\epsilon^{-3})$, which improve the $O(\epsilon^{-4})$ costs of sample evaluations and communications for the existing decentralized stochastic gradient algorithms. We conduct extensive experiments with a variety of learning models, including non-convex logistical regression and convolutional neural networks, to verify our theoretical findings. Collectively, our results contribute to the state of the art of theories and algorithms for decentralized network optimization.
翻译:近些年来,由于在系统稳健性、数据隐私和实施简化方面的优势,机器学习日益受到关注。然而,设计分散优化算法的三大基本挑战是如何减少其样本、通信和记忆复杂性。在本文中,我们提议了一个基于下线{g}radite-underline{t}underline{trtracing_underline}underline{sto}cline{sto}clin{r}curisive\ underline{mline}}}{murline{mline}}}{munderline{mline{murline{murline{surentral}}}}(GT-STORM)算法(G-Storon_})在有效解决非稳定优化问题方面的优势。我们表明,要达到一个 $\epslon2$的固定解决方案,我们算法的样本评价总数是$tilde{O}(m\}(m\}(m=2 ⁇ 2 ⁇ ){{{{{{{{{{{{{{{{{{}}}}_}}}}_(undrinkline)}}}}}}}x(un line)_(tline)_(tline)________________(troom)__)_)_)_(trurviolviolviolvax)x)___(colviolgalma)_)_)_)__(talbilgalbilgal)$,我们的算法,我们的算法,我们的算法,我们算法,我们的算算算算算算算。