Federated learning (FL) is a promising paradigm that enables collaboratively learning a shared model across massive clients while keeping the training data locally. However, for many existing FL systems, clients need to frequently exchange model parameters of large data size with the remote cloud server directly via wide-area networks (WAN), leading to significant communication overhead and long transmission time. To mitigate the communication bottleneck, we resort to the hierarchical federated learning paradigm of HiFL, which reaps the benefits of mobile edge computing and combines synchronous client-edge model aggregation and asynchronous edge-cloud model aggregation together to greatly reduce the traffic volumes of WAN transmissions. Specifically, we first analyze the convergence bound of HiFL theoretically and identify the key controllable factors for model performance improvement. We then advocate an enhanced design of HiFlash by innovatively integrating deep reinforcement learning based adaptive staleness control and heterogeneity-aware client-edge association strategy to boost the system efficiency and mitigate the staleness effect without compromising model accuracy. Extensive experiments corroborate the superior performance of HiFlash in model accuracy, communication reduction, and system efficiency.
翻译:联邦学习(FL)是一个很有希望的范例,它使大量客户能够合作学习共同模式,同时将培训数据留在当地,然而,对于许多现有的FL系统,客户需要通过广域网络直接与远程云服务器交换大型数据规模的模型参数,从而导致大量的通信间接费用和长期传输时间。为了减轻通信瓶颈,我们采用HIFL的等级式联合学习模式,它从移动边缘计算中获益,并结合同步的客户对立模型集合和无同步的边缘球形模型组合,以大大降低广域网传输的流量。具体地说,我们首先分析HIFL理论上的趋同,并查明改进模型性能的主要可控制因素。然后我们主张通过创新地整合基于适应性惯性粘性控制以及异性-认知性-风险组合的强化学习战略,加强HiFlash的设计,以提高系统效率,并在不降低模型准确性能的情况下减轻粘合效应。广泛的实验证实了HiFlax在模型准确性、通信减少和系统效率方面的优劣性。