In this paper, we propose \texttt{FedGLOMO}, the first (first-order) FL algorithm that achieves the optimal iteration complexity (i.e matching the known lower bound) on smooth non-convex objectives -- without using clients' full gradient in each round. Our key algorithmic idea that enables attaining this optimal complexity is applying judicious momentum terms that promote variance reduction in both the local updates at the clients, and the global update at the server. Our algorithm is also provably optimal even with compressed communication between the clients and the server, which is an important consideration in the practical deployment of FL algorithms. Our experiments illustrate the intrinsic variance reduction effect of \texttt{FedGLOMO} which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication-efficiency. As a prequel to \texttt{FedGLOMO}, we propose \texttt{FedLOMO} which applies momentum only in the local client updates. We establish that \texttt{FedLOMO} enjoys improved convergence rates under common non-convex settings compared to prior work, and with fewer assumptions.
翻译:在本文中,我们提出\ textt{FedGlolomo},这是第一个(一阶) FL 算法,在平滑的非曲线目标上实现最优化的迭代复杂性(即匹配已知的较低约束) -- -- 在每一回合中不使用客户的全梯度。我们实现这一最佳复杂性的关键算法理念是应用明智的势头条件,在客户的本地更新和服务器的全球更新中促进差异的减少。我们的算法也是可以想象的最佳。即使客户与服务器之间的通信压缩,这是实际部署 FL 算法的一个重要考虑因素。我们的实验展示了\ textt{FedLOMO} 的内在差异减少效应,它暗含地抑制了不同数据分布环境中的客户驱动功能,并促进了通信效率。作为对\ textt{FedGlolomo} 的预言,我们建议使用\ textt{FedLOMO},它只在本地客户的更新中应用动力。我们确定\ tutt{FedLOMO} 在普通的非conx假设下, 与之前的假设相比, 和更少。