Federated learning is a new distributed learning paradigm that enables efficient training of emerging large-scale machine learning models. In this paper, we consider federated learning on non-convex objectives with compressed communication from the clients to the central server. We propose a novel first-order algorithm (\texttt{FedSTEPH2}) that employs compressed communication and achieves the optimal iteration complexity of $\mathcal{O}(1/\epsilon^{1.5})$ to reach an $\epsilon$-stationary point (i.e. $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq \epsilon$) on smooth non-convex objectives. The proposed scheme is the first algorithm that attains the aforementioned optimal complexity with compressed communication and without using full client gradients at each communication round. The key idea of \texttt{FedSTEPH2} that enables attaining this optimal complexity is applying judicious momentum terms both in the local client updates and the global server update. As a prequel to \texttt{FedSTEPH2}, we propose \texttt{FedSTEPH} which involves a momentum term only in the local client updates. We establish that \texttt{FedSTEPH} enjoys improved convergence rates under various non-convex settings (such as the Polyak-\L{}ojasiewicz condition) and with fewer assumptions than prior work.
翻译:联邦学习是一种新的分布式学习模式,它能够有效地培训新兴的大型机器学习模式。 在本文中, 我们考虑在客户向中央服务器的压缩通信中, 联合学习非硬盘目标。 我们提出一个新的第一阶算法(\ textt{FedSTEPH2}), 使用压缩通信, 实现美元( O})(1/\epsilon}1.5}) 的最佳迭代复杂度, 以达到美元( e.$\ mathbb{E} [nabla f( b{x})\leq f (leq)\ epselon$), 以平滑的非硬盘目标进行压缩的通信。 拟议的方案是第一个算法, 使用压缩通信实现上述最佳复杂性, 并且不使用每轮通信周期的完全客户梯度。 能够实现这一最佳复杂性的关键理念, 在当地客户端更新和全球服务器更新中, 使用明智的势头术语比 本地客户更新更少。 作为非硬页更新的客户端, 将先更新。