In real-world applications, Federated Learning (FL) meets two challenges: (1) scalability, especially when applied to massive IoT networks; and (2) how to be robust against an environment with heterogeneous data. Realizing the first problem, we aim to design a novel FL framework named Full-stack FL (F2L). More specifically, F2L utilizes a hierarchical network architecture, making extending the FL network accessible without reconstructing the whole network system. Moreover, leveraging the advantages of hierarchical network design, we propose a new label-driven knowledge distillation (LKD) technique at the global server to address the second problem. As opposed to current knowledge distillation techniques, LKD is capable of training a student model, which consists of good knowledge from all teachers' models. Therefore, our proposed algorithm can effectively extract the knowledge of the regions' data distribution (i.e., the regional aggregated models) to reduce the divergence between clients' models when operating under the FL system with non-independent identically distributed data. Extensive experiment results reveal that: (i) our F2L method can significantly improve the overall FL efficiency in all global distillations, and (ii) F2L rapidly achieves convergence as global distillation stages occur instead of increasing on each communication cycle.
翻译:在现实应用中,联邦学习联合会(FL)面临两个挑战:(1) 伸缩性,特别是在大规模IoT网络应用时;(2) 如何在使用多种数据的环境上保持稳健。认识到第一个问题,我们的目标是设计一个名为全堆FL(F2L)的新型FL框架。更具体地说,F2L利用一个等级网络结构,在不重建整个网络系统的情况下扩大FL网络,使FL网络的扩展不至于重建整个网络系统。此外,利用等级网络设计的优势,我们提议在全球服务器上采用一种新的由标签驱动的知识蒸馏(LKD)技术来解决第二个问题。相对于当前的知识蒸馏技术,LKD能够培训一个学生模型,其中包括所有教师模型中的良好知识。因此,我们提议的算法可以有效地提取区域数据分布知识(即区域汇总模型),以减少客户模型在FL系统下运行时与不依赖同样分布的数据之间的差异。广泛的实验结果表明:(i) 我们的F2L方法可以大大改进整个FL2周期的趋同,而在全球所有通信阶段实现快速趋同。