Federated learning (FL) is a distributed machine learning paradigm that allows clients to collaboratively train a model over their own local data. FL promises the privacy of clients and its security can be strengthened by cryptographic methods such as additively homomorphic encryption (HE). However, the efficiency of FL could seriously suffer from the statistical heterogeneity in both the data distribution discrepancy among clients and the global distribution skewness. We mathematically demonstrate the cause of performance degradation in FL and examine the performance of FL over various datasets. To tackle the statistical heterogeneity problem, we propose a pluggable system-level client selection method named Dubhe, which allows clients to proactively participate in training, meanwhile preserving their privacy with the assistance of HE. Experimental results show that Dubhe is comparable with the optimal greedy method on the classification accuracy, with negligible encryption and communication overhead.
翻译:联邦学习(FL)是一种分布式的机器学习模式,使客户能够根据自己的本地数据合作培训一个模型。 FL承诺客户的隐私,其安全可以通过加密方法,如添加同质加密(HE)来加强。然而,FL的效率可能因客户之间数据分布差异和全球分布偏差方面的统计差异而严重受损。我们用数学来证明FL性能退化的原因,并检查FL在各种数据集方面的性能。为了解决统计差异性问题,我们提议了一个名为Dubhe的可插入系统级客户选择方法,使客户能够积极参与培训,同时在HE的帮助下维护隐私。实验结果表明,Dubhe与分类准确性的最佳贪婪方法相比,加密和通信间接费用微不足道。