The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings for improving convergence and accuracy. However, there is still a paucity of theoretical principles on where to and how to design and utilize adaptive optimization methods in federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs). First, an analytic framework is established to build a connection between federated optimization methods and decompositions of ODEs of corresponding centralized optimizers. Second, based on this analytic framework, a momentum decoupling adaptive optimization method, FedDA, is developed to fully utilize the global momentum on each local iteration and accelerate the training convergence. Last but not least, full batch gradients are utilized to mimic centralized optimization in the end of the training process to ensure the convergence and overcome the possible inconsistency caused by adaptive optimization methods.
翻译:联合学习(FL)框架使边缘客户能够合作学习共同推论模式,同时保留对客户的培训数据的隐私。最近,已经做出许多努力,将中央适应优化方法(如SGDM、Adam、AdaGrad等)推广到联合环境,以提高趋同性和准确性。然而,对于在联合环境中如何设计和利用适应优化方法的理论原则仍然很少。这项工作旨在从普通差异方程式(ODEs)动态的角度为FL开发新的适应优化方法。首先,建立了一个分析框架,以便在联合优化方法和相应的集中优化器分离之间建立联系。第二,根据这一分析框架,形成了一种动力脱钩适应优化方法(FedDA),以充分利用每个本地循环的全球势头,加速培训趋同。最后但并非最不重要的是,在培训过程结束时,利用整批梯度来模拟集中优化,以确保趋同,并克服适应性优化方法可能造成的不一致性。