Federated learning enables multiple clients to collaboratively learn a global model by periodically aggregating the clients' models without transferring the local data. However, due to the heterogeneity of the system and data, many approaches suffer from the "client-drift" issue that could significantly slow down the convergence of the global model training. As clients perform local updates on heterogeneous data through heterogeneous systems, their local models drift apart. To tackle this issue, one intuitive idea is to guide the local model training by the global teachers, i.e., past global models, where each client learns the global knowledge from past global models via adaptive knowledge distillation techniques. Coming from these insights, we propose a novel approach for heterogeneous federated learning, namely FedGKD, which fuses the knowledge from historical global models for local training to alleviate the "client-drift" issue. In this paper, we evaluate FedGKD with extensive experiments on various CV/NLP datasets (i.e., CIFAR-10/100, Tiny-ImageNet, AG News, SST5) and different heterogeneous settings. The proposed method is guaranteed to converge under common assumptions, and achieves superior empirical accuracy in fewer communication runs than five state-of-the-art methods.
翻译:联邦学习使多个客户能够通过定期汇集客户模型而无需转让当地数据的方式合作学习全球模型。然而,由于系统和数据的多样性,许多方法都受到“客户驱动”问题的影响,而“客户驱动”问题可能大大减缓全球模型培训的趋同速度。随着客户通过多种系统对不同数据进行本地更新,其本地模型会四分五裂。为了解决这一问题,一个直观的想法是指导全球教师对地方模型的培训,即过去的全球模型,每个客户都通过适应性知识蒸馏技术从过去的全球模型中学习全球知识。我们从这些洞察中提出一种新颖的方法,即FedGKD,它将历史全球模型的知识结合到地方培训中,以缓解“客户驱动”问题。在本文中,我们用各种CV/NLP数据集(即CIFAR-10/100、Ty-ImageNet、AG News、SST5)和不同组合背景背景环境的大规模实验来评估FDGKDKD(即CFID),在共同假设下保证采用比高级的精确度方法。