We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round. We view Federated Learning problem primarily from a communication perspective and allow more device level computations to save transmission costs. We point out a fundamental dilemma, in that the minima of the local-device level empirical loss are inconsistent with those of the global empirical loss. Different from recent prior works, that either attempt inexact minimization or utilize devices for parallelizing gradient computation, we propose a dynamic regularizer for each device at each round, so that in the limit the global and device solutions are aligned. We demonstrate both through empirical results on real and synthetic data as well as analytical results that our scheme leads to efficient training, in both convex and non-convex settings, while being fully agnostic to device heterogeneity and robust to large number of devices, partial participation and unbalanced data.
翻译:我们提出了一种分流培训神经网络模型的新型联邦学习方法,即服务器在每轮随机选择的一组设备之间安排合作。我们主要从通信角度看待联邦学习问题,并允许更多的设备水平计算以节省传输成本。我们指出一个根本性的两难,即当地设备一级经验损失的微小范围与全球经验损失的微小范围不相符。与最近曾尝试不尽实际地尽量减少或利用设备平行计算梯度的工程不同,我们为每轮每个设备提议一个动态定律器,以便在限制范围内使全球和装置解决方案一致。我们通过真实和合成数据的经验结果以及分析结果,证明我们的计划导致在配置和非配置环境进行高效的培训,同时完全不注意设备的异质性,对大量设备、部分参与和不平衡的数据具有强力性。