FDKD: 通过知识蒸馏提高沟通效率的联邦学习 (FedKD: Communication Efficient Federated Learning via Knowledge Distillation)

Federated learning is widely used to learn intelligent models from decentralized data. In federated learning, clients need to communicate their local model updates in each iteration of model learning. However, model updates are large in size if the model contains numerous parameters, and there usually needs many rounds of communication until model converges. Thus, the communication cost in federated learning can be quite heavy. In this paper, we propose a communication efficient federated learning method based on knowledge distillation. Instead of directly communicating the large models between clients and server, we propose an adaptive mutual distillation framework to reciprocally learn a student and a teacher model on each client, where only the student model is shared by different clients and updated collaboratively to reduce the communication cost. Both the teacher and student on each client are learned on its local data and the knowledge distilled from each other, where their distillation intensities are controlled by their prediction quality. To further reduce the communication cost, we propose a dynamic gradient approximation method based on singular value decomposition to approximate the exchanged gradients with dynamic precision. Extensive experiments on benchmark datasets in different tasks show that our approach can effectively reduce the communication cost and achieve competitive results.

翻译：联邦学习被广泛用于从分散的数据中学习智能模型。在联合学习中,客户需要在每个学习周期中交流其本地模型更新。然而,如果模型包含许多参数,则模型更新规模很大,通常需要多轮交流,直到模型交汇。因此,联邦学习的通信成本可能相当高。在本文中,我们提议基于知识蒸馏的通信高效联合学习方法。我们提议了一个基于知识蒸馏的通信高效联合学习方法。我们不直接将客户和服务器之间的大型模型进行交流,而是提议一个适应性的共同蒸馏框架,以便相互学习每个客户的学生和教师模型,只有学生模型可以由不同的客户共享,并且通过协作更新来降低通信成本。每个客户的教师和学生都学习其本地数据以及相互吸收的知识,而他们的蒸馏能力则受预测质量的制约。为了进一步降低通信成本,我们提议了一个基于单值分解的动态梯度的动态梯度近似方法,以比较速度来估算交换的梯度。关于不同任务的基准数据集的广泛实验表明,我们的方法可以有效地降低通信的成本和竞争性。