非IID数据高级联邦学习分析和最佳边缘分配 (Analysis and Optimal Edge Assignment For Hierarchical Federated Learning on Non-IID Data)

Learning-based applications have demonstrated practical use cases in ubiquitous environments and amplified interest in exploiting the data stored on users' mobile devices. Distributed learning algorithms aim to leverage such distributed and diverse data to learn a global phenomena by performing training amongst participating devices and repeatedly aggregating their local models' parameters into a global model. Federated learning is a promising paradigm that allows for extending local training among the participant devices before aggregating the parameters, offering better communication efficiency. However, in the cases where the participants' data are strongly skewed (i.e., non-IID), the model accuracy can significantly drop. To face this challenge, we leverage the edge computing paradigm to design a hierarchical learning system that performs Federated Gradient Descent on the user-edge layer and Federated Averaging on the edge-cloud layer. In this hierarchical architecture, the users are assigned to different edges, such that edge-level data distributions turn to be close to IID. We formalize and optimize this user-edge assignment problem to minimize classes' distribution distance between edge nodes, which enhances the Federated Averaging performance. Our experiments on multiple real-world datasets show that the proposed optimized assignment is tractable and leads to faster convergence of models towards a better accuracy value.

翻译：分散式学习算法旨在利用这种分布式和多样的数据,通过对参与设备进行培训,反复将其本地模型参数集中到一个全球模型中,从而利用这种分布式和多样化的数据来学习全球现象; 联合学习是一个很有希望的模式,在汇集参数之前,可以在参与者装置中扩大当地培训,提供更好的通信效率; 但是,在参与者数据严重偏斜(即非二维)的情况下,模型准确性会大幅下降。要面对这一挑战,我们利用边缘计算模式来设计一个等级化学习系统,在用户前沿层和边缘-悬崖层上对联发式进行分层演。在这个分层结构中,用户被分配到不同的边缘,使边缘级数据分布接近ID。我们正式化和优化了这一用户前沿分配问题,以尽量减少分级在边缘节点之间的分布距离,这可以加强联动式的运行。我们关于多个现实世界数据最精确度的模型的实验将显示为更快速的定位。