Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove the $O((\frac{\gamma}{1-\sigma_{\gamma}})^2\sqrt{\frac{L}{\epsilon}})$ and $O((\frac{\gamma}{1-\sigma_{\gamma}})^{1.5}\sqrt{\frac{L}{\mu}}\log\frac{1}{\epsilon})$ complexities for the practical single loop accelerated gradient tracking over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where $\gamma$ and $\sigma_{\gamma}$ are two common constants charactering the network connectivity, $\epsilon$ is the desired precision, and $L$ and $\mu$ are the smoothness and strong convexity constants, respectively. Our complexities improve significantly over the ones of $O(\frac{1}{\epsilon^{5/7}})$ and $O((\frac{L}{\mu})^{5/7}\frac{1}{(1-\sigma)^{1.5}}\log\frac{1}{\epsilon})$, respectively, which were proved in the original literature only for static graphs, where $\frac{1}{1-\sigma}$ equals $\frac{\gamma}{1-\sigma_{\gamma}}$ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to $O(1)$ and $O(\frac{\gamma}{1-\sigma_{\gamma}})$ for the computation and communication complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.
翻译:在现代机器学习中,对时间变化图的分散优化越来越常见, 大量数据储存在数百万个移动设备上, 比如在联盟学习中。 本文重新审视广泛使用的加速梯度跟踪, 并将其推广到时间变化图中。 当问题不是强烈的共性和强烈的共性时, 我们证明$2\qrt\frac{L\epsilón}2\qrt\frac} 美元和$O( (\\\ gamma_1-\ slima_ 1) 和$O (\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\xlxxxxxxxxxx) 。 当我们网络连接以美元为基数的两个常见常数时, 美元是原始的精确度, 美元是原始的直数, 美元和美元的直系是平的平坦性 。