Pre-training is prevalent in nowadays deep learning to improve the learned model's performance. However, in the literature on federated learning (FL), neural networks are mostly initialized with random weights. These attract our interest in conducting a systematic study to explore pre-training for FL. Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients' data. To make our findings applicable to situations where pre-trained models are not directly available, we explore pre-training with synthetic data or even with clients' data in a decentralized manner, and found that they can already improve FL notably. Interestingly, many of the techniques we explore are complementary to each other to further boost the performance, and we view this as a critical result toward scaling up deep FL for real-world applications. We conclude our paper with an attempt to understand the effect of pre-training on FL. We found that pre-training enables the learned global models under different clients' data conditions to converge to the same loss basin, and makes global aggregation in FL more stable. Nevertheless, pre-training seems to not alleviate local model drifting, a fundamental problem in FL under non-IID data.
翻译:预训练在当今的深度学习中广泛应用,以提高所学模型的性能。然而,在联邦学习(FL)的文献中,神经网络大多以随机权重进行初始化。这引起了我们的兴趣,进行系统的研究,探索 FL 中的预训练。在多个视觉识别基准测试中,我们发现预训练不仅可以改进 FL,而且还可以缩小其准确度与集中式学习的差距,尤其是在非 IID 客户端数据的挑战性情况下。为了使我们的研究成果适用于没有直接可用的预训练模型的情况,我们探索了使用合成数据甚至分散式地使用客户端数据进行预训练,并发现它们可以显著提高 FL 的表现。有趣的是,我们探索的许多技术相互补充,可以进一步提高性能。我们认为这是 FL 实现规模化的关键结果,适用于实际应用。在文章中我们做出了结论,尝试理解 FL 中预训练的影响。我们发现,预训练使得在不同客户端数据条件下学习到的全局模型收敛于相同的损失盆地,并使 FL 中的全局聚合更加稳定。然而,预训练似乎无法减轻 FL 中非 IID 数据下的本地模型漂移问题。