The emerging paradigm of federated learning (FL) strives to enable collaborative training of deep models on the network edge without centrally aggregating raw data and hence improving data privacy. In most cases, the assumption of independent and identically distributed samples across local clients does not hold for federated learning setups. Under this setting, neural network training performance may vary significantly according to the data distribution and even hurt training convergence. Most of the previous work has focused on a difference in the distribution of labels or client shifts. Unlike those settings, we address an important problem of FL, e.g., different scanners/sensors in medical imaging, different scenery distribution in autonomous driving (highway vs. city), where local clients store examples with different distributions compared to other clients, which we denote as feature shift non-iid. In this work, we propose an effective method that uses local batch normalization to alleviate the feature shift before averaging models. The resulting scheme, called FedBN, outperforms both classical FedAvg, as well as the state-of-the-art for non-iid data (FedProx) on our extensive experiments. These empirical results are supported by a convergence analysis that shows in a simplified setting that FedBN has a faster convergence rate than FedAvg. Code is available at https://github.com/med-air/FedBN.
翻译:新兴的联邦学习模式(FL)力求使网络边缘深层模型的合作培训成为合作性培训,而没有集中汇集原始数据,从而改善数据隐私; 在多数情况下,当地客户独立和同样分布的样本的假设并不支持联合学习设置; 在这种背景下,神经网络培训绩效可能因数据分布而有很大差异,甚至损害培训的趋同; 以往的大部分工作侧重于标签分布或客户转换的差异。 与这些环境不同,我们处理的是FL的一个重要问题,例如医疗成像中的不同扫描仪/传感器、自主驾驶中的不同场景分布(高速公路对城市),当地客户与其他客户相比,以不同分布的样本存储实例,我们将此作为非二进制特征转换的标志。 在这项工作中,我们提出了一种有效的方法,即利用当地批次正常化来缓解特征在平均模型之前的变化。 由此产生的办法,称为FDBN,优于传统的FDAvg, 以及用于非二进制数据(FedProx)的状态-艺术分布(FDA/BEM)比其他客户的简化的趋同率分析得到支持。 这些实验结果在FDA/BEBR/BR的加速分析中得到支持。