In this paper, we question the rationale behind propagating large numbers of parameters through a distributed system during federated learning. We start by examining the rank characteristics of the subspace spanned by gradients across epochs (i.e., the gradient-space) in centralized model training, and observe that this gradient-space often consists of a few leading principal components accounting for an overwhelming majority (95-99%) of the explained variance. Motivated by this, we propose the "Look-back Gradient Multiplier" (LBGM) algorithm, which exploits this low-rank property to enable gradient recycling between model update rounds of federated learning, reducing transmissions of large parameters to single scalars for aggregation. We analytically characterize the convergence behavior of LBGM, revealing the nature of the trade-off between communication savings and model performance. Our subsequent experimental results demonstrate the improvement LBGM obtains in communication overhead compared to conventional federated learning on several datasets and deep learning models. Additionally, we show that LBGM is a general plug-and-play algorithm that can be used standalone or stacked on top of existing sparsification techniques for distributed model training.
翻译:在本文中,我们质疑在联合学习期间通过分布式系统传播大量参数背后的理由。我们首先在集中模式培训中检查以跨时代梯度(即梯度-空间)跨度跨度的亚空间的等级特征,并观察到这一梯度空间通常由占解释差异绝大多数(95-99%)的少数主要组成部分构成。我们为此提议了“回视梯度增压倍增倍倍数”算法(LBGM),该算法利用这一低位属性,使得在模范更新学习周期之间实现梯度循环利用,减少向单级级级群的传输大参数。我们分析地说明了LBGM的趋同行为,揭示了通信节余与模型性能之间的权衡性质。我们随后的实验结果表明,LBGM在通信管理上取得了改进,而在若干数据集和深层学习模式上,与传统的冷淡学习相比,LBGM是一种通用的插曲算法,可用于现有顶层培训技术的连续或叠叠叠叠叠。