Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have had notable success in combating such issues. In this work, we propose federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyze their convergence in the presence of heterogeneous data for general non-convex settings. Our results highlight the interplay between client heterogeneity and communication efficiency. We also perform extensive experiments on these methods and show that the use of adaptive optimizers can significantly improve the performance of federated learning.
翻译:联邦学习是一种分布式的机器学习模式,在这种模式中,许多客户与中央服务器协调,学习一个模型,而不分享自己的培训数据;标准联合优化方法,如联邦verage(FedAvg),往往难以调和和并表现出不受欢迎的趋同行为;在非联邦环境中,适应优化方法在解决这些问题方面取得了显著成功;在这项工作中,我们提出了适应优化方法的联成版本,包括Adagrad、Adam和Yogi,并分析了这些组合,同时在一般非康韦克斯环境下存在各种数据的情况下加以分析。我们的结果突显了客户的异质性和通信效率之间的相互作用。我们还对这些方法进行了广泛的实验,并表明适应优化方法的使用可以大大改善联邦学习的绩效。