In federated learning (FL), model performance typically suffers from client drift induced by data heterogeneity, and mainstream works focus on correcting client drift. We propose a different approach named virtual homogeneity learning (VHL) to directly "rectify" the data heterogeneity. In particular, VHL conducts FL with a virtual homogeneous dataset crafted to satisfy two conditions: containing no private information and being separable. The virtual dataset can be generated from pure noise shared across clients, aiming to calibrate the features from the heterogeneous clients. Theoretically, we prove that VHL can achieve provable generalization performance on the natural distribution. Empirically, we demonstrate that VHL endows FL with drastically improved convergence speed and generalization performance. VHL is the first attempt towards using a virtual dataset to address data heterogeneity, offering new and effective means to FL.
翻译:在联合学习(FL)中,模型性能通常会因数据差异引起的客户漂移而受到影响,主流工作则侧重于纠正客户漂移。我们提出了一种不同的方法,名为虚拟同质学习(VHL),直接“校正”数据差异性。特别是,VHL使用虚拟同质数据集,以满足两种条件:不包含私人信息,而且可以分离。虚拟数据集可以来自客户之间共享的纯噪音,目的是校准不同客户的特征。理论上,我们证明VHL可以在自然分布上实现可变的通用性能。我们很生动地证明,VHL会以大幅提高的趋同速度和一般化性能来显示FL。VHL是使用虚拟数据集解决数据差异性,为FL提供新的有效手段的首次尝试。