Data heterogeneity across clients is a key challenge in federated learning. Prior works address this by either aligning client and server models or using control variates to correct client model drift. Although these methods achieve fast convergence in convex or simple non-convex problems, the performance in over-parameterized models such as deep neural networks is lacking. In this paper, we first revisit the widely used FedAvg algorithm in a deep neural network to understand how data heterogeneity influences the gradient updates across the neural network layers. We observe that while the feature extraction layers are learned efficiently by FedAvg, the substantial diversity of the final classification layers across clients impedes the performance. Motivated by this, we propose to correct model drift by variance reduction only on the final layers. We demonstrate that this significantly outperforms existing benchmarks at a similar or lower communication cost. We furthermore provide proof for the convergence rate of our algorithm.
翻译:客户之间的数据差异是联盟学习中的一个关键挑战。 先前的工作是通过对客户和服务器模型进行对齐或使用控制变量来解决这个问题, 或使用控制变量来纠正客户模式的漂移。 虽然这些方法在共形或简单的非共形问题上迅速趋同, 但是在深度神经网络等超参数模型的性能仍然不足。 在本文中, 我们首先重温一个深神经网络中广泛使用的FedAvg算法, 以了解数据差异性如何影响神经网络各层的梯度更新。 我们观察到,尽管FedAvg 高效地学习了特征提取层,但客户最后分类层的高度多样性阻碍了业绩。 我们为此提议只对最后层进行差异减少的模型修正。 我们证明,这大大超出了以类似或较低的通信成本衡量现有基准。 我们还为我们算法的趋同率提供了证据。