In the emerging paradigm of Federated Learning (FL), large amount of clients such as mobile devices are used to train possibly high-dimensional models on their respective data. Combining (dimension-wise) adaptive gradient methods (e.g. Adam, AMSGrad) with FL has been an active direction, which is shown to outperform traditional SGD based FL in many cases. In this paper, we focus on the problem of training federated deep neural networks, and propose a novel FL framework which further introduces layer-wise adaptivity to the local model updates. Our framework can be applied to locally adaptive FL methods including two recent algorithms, Mime and Fed-AMS. Theoretically, we provide a convergence analysis of our layer-wise FL methods, coined Fed-LAMB and Mime-LAMB, which matches the convergence rate of state-of-the-art results in FL and exhibits linear speedup in terms of the number of workers. Experimental results on various datasets and models, under both IID and non-IID local data settings, show that both Fed-LAMB and Mime-LAMB achieve faster convergence speed and better generalization performance, compared to the various recent adaptive FL methods.
翻译:在新兴的联邦学习(FL)范式中,大量客户,如移动设备等,被用于培训可能具有高度维度的各自数据模型;将(多元-自成一体的)适应性梯度方法(如Adam、AMSGrad)与FL相结合是一个积极的方向,在许多情况中,这证明超越了传统的SGDFFL。在本文件中,我们侧重于培训联邦深层神经网络的问题,并提出了一个新的FL框架,进一步引入了对本地模型更新的分层适应性适应性。我们的框架可以适用于适应本地适应性FL方法,包括两种最近的算法,即Mime和Fed-AMS。理论上,我们对我们的层次-FL方法,即Fed-LAMB和Mime-LAMB进行了趋同分析,这些方法与FL最新水平结果的趋同率和工人人数的线性加速。在ID和非II-D本地数据设置下,各种数据集和模型的实验结果显示,FD-LMMB和M-LMM-AM-AM-AM-A都比较了较快的进度。