In federated learning (FL) systems, e.g., wireless networks, the communication cost between the clients and the central server can often be a bottleneck. To reduce the communication cost, the paradigm of communication compression has become a popular strategy in the literature. In this paper, we focus on biased gradient compression techniques in non-convex FL problems. In the classical setting of distributed learning, the method of error feedback (EF) is a common technique to remedy the downsides of biased gradient compression. In this work, we study a compressed FL scheme equipped with error feedback, named Fed-EF. We further propose two variants: Fed-EF-SGD and Fed-EF-AMS, depending on the choice of the global model optimizer. We provide a generic theoretical analysis, which shows that directly applying biased compression in FL leads to a non-vanishing bias in the convergence rate. The proposed Fed-EF is able to match the convergence rate of the full-precision FL counterparts under data heterogeneity with a linear speedup. Moreover, we develop a new analysis of the EF under partial client participation, which is an important scenario in FL. We prove that under partial participation, the convergence rate of Fed-EF exhibits an extra slow-down factor due to a so-called ``stale error compensation'' effect. A numerical study is conducted to justify the intuitive impact of stale error accumulation on the norm convergence of Fed-EF under partial participation. Finally, we also demonstrate that incorporating the two-way compression in Fed-EF does not change the convergence results. In summary, our work conducts a thorough analysis of the error feedback in federated non-convex optimization. Our analysis with partial client participation also provides insights on a theoretical limitation of the error feedback mechanism, and possible directions for improvements.
翻译:在联合学习系统(FL)中,例如无线网络,客户和中央服务器之间的通信成本往往是一个瓶颈。为了降低通信成本,通信压缩范式已成为文献中流行的战略。在本文中,我们侧重于非康维克斯FL问题中的偏斜梯度压缩技术。在传统的分布式学习环境中,错误反馈方法是纠正偏差梯度压缩的下方的一个常见方法。在这项工作中,我们研究一个压缩的FL计划,配有错误反馈,称为FFD-EF。我们进一步提出了两个变式:Fed-EF-SGD和Fed-EF-AMS,这取决于全球模型优化师的选择。我们提供了一种通用理论分析,表明直接应用偏差梯度压缩技术在非康维利FL问题中导致非加速参与偏差。拟议的FDFEF方法能够与数据偏差下完全精准的FL对等对口单位的趋同速度和线性变换。此外,我们根据部分客户端对FDR(F)参与率的偏差进行新的分析,我们也在FDRloral-lder Referal Referent Fal 中将一个重要推算。