We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(\epsilon^{-1.5})$ to converge to an $\epsilon$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(\epsilon^{-2})$ complexity of most prior works. Our key algorithmic idea that enables achieving this improved complexity is based on the observation that the convergence in FL is hampered by two sources of high variance: (i) the global server aggregation step with multiple local updates, exacerbated by client heterogeneity, and (ii) the noise of the local client-level stochastic gradients. By modeling the server aggregation step as a generalized gradient-type update, we propose a variance-reducing momentum-based global update at the server, which when applied in conjunction with variance-reduced local updates at the clients, enables \texttt{FedGLOMO} to enjoy an improved convergence rate. Moreover, we derive our results under a novel and more realistic client-heterogeneity assumption which we verify empirically -- unlike prior assumptions that are hard to verify. Our experiments illustrate the intrinsic variance reduction effect of \texttt{FedGLOMO}, which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication efficiency.
翻译:我们提出\ textt{ FedGLOMO}, 一种全新的联结学习算法, 其迭代复杂性为$\ mathcal{O} (\\ epsilon}-1.5}) 美元, 以汇集到$\ epsilon$- 固定点( 即$\ mathbb{ E} [\\ nabla f( bm{x}) 2,\leq\ \ epsilon$), 用于平滑的非 conx 功能 -- -- 在任意客户的内分流异质和压缩的通信( F) -- 与大多数先前工程的超常相异性( \ mathalicalsal_ O} (\\\ epsilon) 比较复杂。 我们实现这一复杂度的关键算法概念的理念是基于这样的观察, 即FL的趋同性会受到两个高差异源的阻碍:(i) 全球服务器汇总步骤, 客户端的更新, 由客户端偏差性更新, 以及(ii) (ii) 本地客户端的递增的客户端的客户端变异化变换数据。