Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, the huge number, high heterogeneity and limited availability of clients result in high client-variance. This paper addresses these two issues together by proposing compressed and client-variance reduced methods COFIG and FRECON. We prove an $O(\frac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+\frac{(1+\omega)N^{2/3}}{S\epsilon^2})$ bound on the number of communication rounds of COFIG in the nonconvex setting, where $N$ is the total number of clients, $S$ is the number of clients participating in each round, $\epsilon$ is the convergence error, and $\omega$ is the variance parameter associated with the compression operator. In case of FRECON, we prove an $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon^2})$ bound on the number of communication rounds. In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which is also the first convergence result for compression schemes that do not communicate with all the clients in each round. We stress that neither COFIG nor FRECON needs to communicate with all the clients, and they enjoy the first or faster convergence results for convex and nonconvex federated learning in the regimes considered. Experimental results point to an empirical superiority of COFIG and FRECON over existing baselines.
翻译:由于分布式和联结式学习应用程序中的通信瓶颈,使用通信压缩的算法引起了人们的极大关注,并在实践中得到广泛使用。此外,客户数量庞大、高度异质性和有限可用性导致客户差异很大。本文件通过提出压缩和客户差异减少方法COFIG和FRECON来解决这两个问题。我们证明这是一个美元(frac{(1 ⁇ omega)3/2 ⁇ sqrts{N ⁇ S\epsilon2}frac{(1 ⁇ omega)N2/3 ⁇ Sepsilon2}(美元)结合在非convex设置中COFIG的通信轮数,美元是每轮参与客户数目的压缩和客户差异减少方法COFIG(美元)是合并错误,美元是压缩操作者的差异参数。对于FRECON来说,我们首先需要的是美元(rafrc{(1 ⁇ omega)N_Seplon2}(美元)在FFIFILO2x交易中,而不是FIFILN(美元)的周期内, 和(美元)所有通信结果。