Federated Learning (FL) is a novel distributed machine learning approach to leverage data from Internet of Things (IoT) devices while maintaining data privacy. However, the current FL algorithms face the challenges of non-independent and identically distributed (non-IID) data, which causes high communication costs and model accuracy declines. To address the statistical imbalances in FL, we propose a clustered data sharing framework which spares the partial data from cluster heads to credible associates through device-to-device (D2D) communication. Moreover, aiming at diluting the data skew on nodes, we formulate the joint clustering and data sharing problem based on the privacy-preserving constrained graph. To tackle the serious coupling of decisions on the graph, we devise a distribution-based adaptive clustering algorithm (DACA) basing on three deductive cluster-forming conditions, which ensures the maximum yield of data sharing. The experiments show that the proposed framework facilitates FL on non-IID datasets with better convergence and model accuracy under a limited communication environment.
翻译:联邦学习联盟(FL)是一种新颖的分布式机器学习方法,在维护数据隐私的同时,利用互联网上的东西(IoT)设备的数据,但目前的FL算法面临非独立和同样分布(非IID)数据的挑战,造成通信成本高,模型准确性下降;为解决FL的统计失衡问题,我们提议了一个分组数据共享框架,通过设备对设备(D2D)的通信,将部分数据从分组头部留给可信的伙伴。此外,为了在节点上稀释数据,我们根据隐私保护限制图表制定了联合组合和数据共享问题。为了解决在图表上严重合并决定的问题,我们设计了基于分配的适应性组合算法(DACA),以三个分解组合格式条件为基础,确保数据共享的最大收益。实验表明,拟议的框架在有限的通信环境中,促进非IID数据集的FL,使其更加趋同和模型准确。