Local Stochastic Gradient Descent (SGD) with periodic model averaging (FedAvg) is a foundational algorithm in Federated Learning. The algorithm independently runs SGD on multiple workers and periodically averages the model across all the workers. When local SGD runs with many workers, however, the periodic averaging causes a significant model discrepancy across the workers making the global loss converge slowly. While recent advanced optimization methods tackle the issue focused on non-IID settings, there still exists the model discrepancy issue due to the underlying periodic model averaging. We propose a partial model averaging framework that mitigates the model discrepancy issue in Federated Learning. The partial averaging encourages the local models to stay close to each other on parameter space, and it enables to more effectively minimize the global loss. Given a fixed number of iterations and a large number of workers (128), the partial averaging achieves up to 2.2% higher validation accuracy than the periodic full averaging.
翻译:本地存储源(SGD)的周期模型平均(FedAvg)是联邦学习中的一种基本算法。算法独立运行对多个工人的 SGD,并定期对所有工人的模型进行平均。然而,当本地 SGD 与许多工人运行时,周期平均造成工人之间的显著模型差异,使全球损失趋同缓慢。虽然最近的先进优化方法解决了非IID设置的问题,但由于基础周期模型平均,仍然存在着模型差异问题。我们提出了一个部分平均框架,以缓解联邦学习中的模型差异问题。部分平均法鼓励当地模型在参数空间上彼此保持距离,从而能够更有效地减少全球损失。考虑到固定的迭代数和大量工人(128),部分平均达到比定期全平均值高出2.2%的验证准确率。