Federated Learning (FL) has shown great potential as a privacy-preserving solution to learning from decentralized data that are only accessible to end devices (i.e., clients). In many scenarios, however, a large proportion of the clients are probably in possession of low-quality data that are biased, noisy or even irrelevant. As a result, they could significantly slow down the convergence of the global model we aim to build and also compromise its quality. In light of this, we propose FedProf, a novel algorithm for optimizing FL under such circumstances without breaching data privacy. The key of our approach is a distributional representation profiling and matching scheme that uses the global model to dynamically profile data representations and allows for low-cost, lightweight representation matching. Based on the scheme we adaptively score each client and adjust its participation probability so as to mitigate the impact of low-value clients on the training process. We have conducted extensive experiments on public datasets using various FL settings. The results show that the selective behaviour of our algorithm leads to a significant reduction in the number of communication rounds and the amount of time (up to 2.4x speedup) for the global model to converge and also provides accuracy gain.
翻译:联邦学习联合会(FL)已经展示了巨大的潜力,作为从只能为终端设备(即客户)获得的分散数据中学习的隐私保护解决方案。然而,在很多情况下,很大一部分客户可能拥有偏差、吵闹甚至无关紧要的低质量数据。因此,他们可以大大减缓我们所要建立的全球模式的趋同,并损害其质量。据此,我们提议FedProf(FedProf),这是一种在这种情形下在不破坏数据隐私的情况下优化FL的新型算法。我们的方法的关键是一个分配代表性分析和匹配方案,利用全球模型动态地描述数据,并允许低成本、轻量度的表示匹配。基于我们适应性地给每个客户评分并调整其参与概率,以减轻低价值客户对培训进程的影响。我们利用各种FL环境对公共数据集进行了广泛的实验。结果显示,我们的算法的选择性行为导致通信周期数量大幅下降,全球模型的时间(最高为2.4x速度)也提供了准确性收益。