As a promising integrated computation and communication learning paradigm, federated learning (FL) carries a periodic sharing from distributed clients. Due to the non-i.i.d. data distribution on clients, FL model suffers from the gradient diversity, poor performance, bad convergence, etc. In this work, we aim to tackle this key issue by adopting data-driven importance sampling (IS) for local training. We propose a trustworthy framework, named importance sampling federated learning (ISFL), which is especially compatible with neural network (NN) models. The framework is evaluated both theoretically and experimentally. Firstly, we derive the parameter deviation bound between ISFL and the centralized full-data training to identify the main factors of the non-i.i.d. dilemmas. We will then formulate the selection of optimal IS weights as an optimization problem and obtain theoretical solutions. We also employ water-filling methods to calculate the IS weights and develop the complete ISFL algorithms. The experimental results on CIFAR-10 fit our proposed theories well and prove that ISFL reaps higher performance, as well as better convergence on non-i.i.d. data. To the best of our knowledge, ISFL is the first non-i.i.d. FL solution from the local sampling aspect which exhibits theoretical NN compatibility. Furthermore, as a local sampling approach, ISFL can be easily migrated into emerging FL frameworks.
翻译:作为前景良好的综合计算和交流学习模式,联合会学习(FL)定期与分布式客户共享。由于客户数据分布不一,FL模型存在梯度多样性、业绩不佳、趋同性差等等。在这项工作中,我们的目标是通过采用数据驱动的重要性抽样(IS)来解决这一关键问题。我们提出了一个可靠的框架,称为重要抽样联合会学习(ISFL),与神经网络(NNN)模式特别兼容。框架在理论上和实验上都比较容易地加以评估。首先,我们从ISL和集中的全数据培训中得出参数偏差,以确定非i.i.d两难点的主要因素。然后,我们将将选择最佳IS的权重作为优化问题,并获得理论解决办法。我们还采用填补水的方法来计算IS权重并开发完整的ISFL算法。CFAR-10的实验结果符合我们提议的理论,并且证明ISFL在非i.i.i.d. 数据上更加接近非i.i.d.集中化的非i.d. 最佳的IS-FL取样方法。