Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data. In Federated Learning, a set of clients jointly perform a machine learning task under the coordination of a server. The FedAvg algorithm is one of the most widely used methods to solve Federated Learning problems. In FedAvg, the learning rate is a constant rather than changing adaptively. The adaptive gradient methods show superior performance over the constant learning rate schedule; however, there is still no general framework to incorporate adaptive gradient methods into the federated setting. In this paper, we propose \textbf{FedDA}, a novel framework for local adaptive gradient methods. The framework adopts a restarted dual averaging technique and is flexible with various gradient estimation methods and adaptive learning rate formulations. In particular, we analyze \textbf{FedDA-MVR}, an instantiation of our framework, and show that it achieves gradient complexity $\tilde{O}(\epsilon^{-1.5})$ and communication complexity $\tilde{O}(\epsilon^{-1})$ for finding a stationary point $\epsilon$. This matches the best known rate for first-order FL algorithms and \textbf{FedDA-MVR} is the first adaptive FL algorithm that achieves this rate. We also perform extensive numerical experiments to verify the efficacy of our method.
翻译:FedAvg 算法是解决FedAvg 学习问题最广泛使用的方法之一。在 FedAvg 中,学习率是固定的,而不是适应性的变化。适应性梯度方法显示的绩效优于固定学习率时间表;然而,在将适应性梯度方法纳入联邦制设置方面,仍然没有总体框架。在本文中,我们提议了一种全新的本地适应性梯度方法框架,即本地适应性梯度方法新框架。这个框架采用了重新启动的双均价技术,并且灵活地使用各种梯度估计方法和适应性学习率公式。特别是,我们分析了一个不变的学习率而不是适应性变化。适应性梯度方法显示的绩效优于固定学习率;然而,在将适应性梯度方法纳入联邦化的设置中,仍然没有总体框架。在本文中,我们提议了\ textbf{FedDA} 和通信复杂性 $Textdede{O},这是为找到一种固定性水平的FDAxxxx 。我们第一次测试了这一固定性水平的Fluslgal 。