Pre-training is prevalent in nowadays deep learning to improve the learned model's performance. However, in the literature on federated learning (FL), neural networks are mostly initialized with random weights. These attract our interest in conducting a systematic study to explore pre-training for FL. Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients' data. To make our findings applicable to situations where pre-trained models are not directly available, we explore pre-training with synthetic data or even with clients' data in a decentralized manner, and found that they can already improve FL notably. Interesting, many of the techniques we explore are complementary to each other to further boost the performance, and we view this as a critical result toward scaling up deep FL for real-world applications. We conclude our paper with an attempt to understand the effect of pre-training on FL. We found that pre-training enables the learned global models under different clients' data conditions to converge to the same loss basin, and makes global aggregation in FL more stable. Nevertheless, pre-training seems to not alleviate local model drifting, a fundamental problem in FL under non-IID data.
翻译:培训前的学习在当今深层学习中很普遍,目的是提高学习模式的绩效。然而,在联邦学习(FL)的文献中,神经网络大多是随机权重的初始化。这吸引了我们的兴趣,要进行系统的研究,以探索FL培训前的训练。 通过多种视觉识别基准,我们发现,培训前不仅可以改进FL,而且还可以缩小与对应的中央学习的准确性差距,特别是在非IID客户数据具有挑战性的情况下。为了使我们的调查结果适用于没有直接获得培训前模型的情况,我们探索以分散方式对合成数据甚至客户数据进行预先培训,发现它们已经能够改进FL。有趣的是,我们所探索的许多技术是相辅相成的,以进一步提高FL的性能。我们认为,培训前,我们试图了解培训前对FL的影响。我们发现,在不同的客户数据条件下,培训前使学习过的全球模型能够与同一损失盆地趋同,使FL的基本数据在FL中变得不易变。但是,在FL之前,在FL的基本数据训练中,似乎没有稳定。