The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.
翻译:实践证明,在对客户进行汇总之前,应用几项本地更新的做法,是克服联邦学习联合会(FL)中沟通瓶颈的成功方法。在这项工作中,我们提出了一个总体方案,即FedShuffle(FedShuffle),它更好地利用FL(FedShuffle)的本地更新,特别是在多种制度中。与许多以前的工作不同,FedShuffle(FedShuffle)并不认为每个设备更新的数量有任何统一性。我们的FedShuffle(FedShuffle)配方包括四个简单但功能不明显的元素:1)数据在当地进行拼接,2)调整当地学习率,3)更新加权,4)减少势头差异(Cutkosky和Orabona,2019年)。我们提出了一个总体方案,即FDShuffle(FTS)全面理论分析,该方法在理论上和实践中,我们的方法并不因目标功能不匹配而受影响。例如,FedAvg(Mhan等人,201717)。此外,通过将上述要素合并,FedS(W)改进了FedNova(S)的实时更新和Flevilal-L(Flation)的每一种数据在2020中也显示,在最后显示,我们在使用这种递减压缩。