In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed in distributed training is gradually shifting from computation to communication. Meanwhile, in the hope of training machine learning models on mobile devices, a new distributed training paradigm called ``federated learning'' has become popular. The communication time in federated learning is especially important due to the low bandwidth of mobile devices. While various approaches to improve the communication efficiency have been proposed for federated learning, most of them are designed with SGD as the prototype training algorithm. While adaptive gradient methods have been proven effective for training neural nets, the study of adaptive gradient methods in federated learning is scarce. In this paper, we propose an adaptive gradient method that can guarantee both the convergence and the communication efficiency for federated learning.
翻译:近年来,分配式优化被证明是加速培训大型机器学习模式(如深神经网络)的有效方法。随着GPU的计算能力不断提高,分配式培训中培训速度的瓶颈正在逐渐从计算转向通信。与此同时,为了在移动设备上培训机器学习模式,一种新的分配式培训模式“联邦学习”已经变得很受欢迎。由于移动设备带宽低,联合学习的沟通时间特别重要。虽然为联合学习提出了各种提高通信效率的方法,但大多数都是用SGD设计的,作为原型培训算法。虽然适应性梯度方法已证明对培训神经网有效,但对联邦学习中的适应性梯度方法的研究却很少。在本文件中,我们提出了一种适应性梯度方法,既能保证联合,又能保证联合学习的通信效率。