The federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. Popular optimization algorithms of FL use vanilla (stochastic) gradient descent for both local updates at clients and global updates at the aggregating server. Recently, adaptive optimization methods such as AdaGrad have been studied for server updates. However, the effect of using adaptive optimization methods for local updates at clients is not yet understood. We show in both theory and practice that while local adaptive methods can accelerate convergence, they can cause a non-vanishing solution bias, where the final converged solution may be different from the stationary point of the global objective function. We propose correction techniques to overcome this inconsistency and complement the local adaptive methods for FL. Extensive experiments on realistic federated training tasks show that the proposed algorithms can achieve faster convergence and higher test accuracy than the baselines without local adaptivity.
翻译:联合学习(FL)框架利用在边端客户端设备储存的分散数据,通过定期汇总当地培训的模式,对机器学习模式进行培训。FL的大众优化算法使用香草(Stochestic)梯度下降法在当地客户进行更新,并在汇总服务器进行全球更新。最近,对AdaGrad等适应性优化方法进行了研究,以更新服务器。然而,在用户进行本地更新时使用适应性优化方法的效果尚未被理解。我们在理论和实践两方面都表明,虽然地方适应性方法可以加速趋同,但它们可能造成非乏味的解决方案偏差,最终趋同的解决方案可能与全球目标功能的固定点不同。我们提出了纠正技术,以克服这种不一致性,并补充FL的本地适应方法。关于现实化联合培训任务的广泛实验表明,拟议的计算法可以实现更快的趋同和测试精度高于基线,而没有本地的适应性。