In Federated Learning (FL), a strong global model is collaboratively learned by aggregating the clients' locally trained models. Although this allows no need to access clients' data directly, the global model's convergence often suffers from data heterogeneity. This paper suggests that forgetting could be the bottleneck of global convergence. We observe that fitting on biased local distribution shifts the feature on global distribution and results in forgetting of global knowledge. We consider this phenomenon as an analogy to Continual Learning, which also faces catastrophic forgetting when fitted on the new task distribution. Based on our findings, we hypothesize that tackling down the forgetting in local training relives the data heterogeneity problem. To this end, we propose a simple yet effective framework Federated Local Self-Distillation (FedLSD), which utilizes the global knowledge on locally available data. By following the global perspective on local data, FedLSD encourages the learned features to preserve global knowledge and have consistent views across local models, thus improving convergence without compromising data privacy. Under our framework, we further extend FedLSD to FedLS-NTD, which only considers the not-true class signals to compensate noisy prediction of the global model. We validate that both FedLSD and FedLS-NTD significantly improve the performance in standard FL benchmarks in various setups, especially in the extreme data heterogeneity cases.
翻译:在联邦学习联盟(FL)中,通过汇集客户在当地培训的模型,可以合作学习一个强大的全球模式。虽然这样不需要直接获取客户的数据,但全球模式的趋同往往会因数据差异性而受到影响。本文指出,忘记可能是全球趋同的瓶颈。我们认为,适应偏差的地方分配会改变全球分布和结果的特征,从而忘记全球知识;我们认为,这种现象与Continual Learning(Continational Learning)相似,在适应新的任务分配时,该现象也面临灾难性的遗忘。根据我们的调查结果,我们假设解决当地培训中忘记的数据重复的问题会影响数据异质性。为此,我们提出一个简单而有效的框架(Fed-LSD),利用当地数据的全球知识。根据对当地数据的全球观点,FedLSD鼓励学习的特性,以保存全球知识,并在不损及数据隐私的情况下提高趋同性。根据我们的框架,我们进一步将FDLSD-NTD(FD-NDSD)的FD-N-SD标准模型扩展到FD-SD(我们仅考虑FDDL)中不甚高的模型,特别是FDLSDSD-SDSDDDDDD的模型。