Composition optimization recently appears in many machine learning applications such as meta learning and reinforcement learning. Recently many composition optimization algorithms have been proposed and studied, however, few adaptive algorithm considers the composition optimization under the distributed setting. Meanwhile, the existing distributed composition optimization methods still suffer from high sample and communication complexities. In the paper, thus, we develop a class of faster momentum-based federated compositional gradient descent algorithms (i.e., MFCGD and AdaMFCGD) to solve the nonconvex distributed composition problems, which builds on the momentum-based variance reduced and local-SGD techniques. In particular, our adaptive algorithm (i.e., AdaMFCGD) uses a unified adaptive matrix to flexibly incorporate various adaptive learning rates. Moreover, we provide a solid theoretical analysis for our algorithms under non-i.i.d. setting, and prove our algorithms obtain a lower sample and communication complexities simultaneously than the existing federated compositional algorithms. Specifically, our algorithms obtain lower sample complexity of $\tilde{O}(\epsilon^{-3})$ with lower communication complexity of $\tilde{O}(\epsilon^{-2})$ in finding an $\epsilon$-stationary point. We conduct the experiments on robust federated learning and distributed meta learning tasks to demonstrate efficiency of our algorithms.
翻译:最近,许多机器学习应用软件(如元学习和强化学习)中出现了优化的构成。最近,许多组合优化算法已经提出并研究过,然而,很少有适应性算法考虑到分布式环境中的构成优化。与此同时,现有的分布式构成优化方法仍然受到高抽样和通信复杂性的困扰。因此,在论文中,我们开发了一类更快捷的基于动力的组合式结构梯度梯度算法(即MFCGD和AdaMFCCGD),以解决非 convex分布式构成问题,这基于基于动力差异的减低和地方-SGD技术。特别是,我们的适应性算法(即AdaMFCGD)使用统一的适应性矩阵来灵活纳入各种适应性学习率。此外,我们对非(d)设置下的计算法提供了坚实的理论分析,并证明我们的算法比现有的组合式构成算法(即MFCTGDGDGD)获得较低的样本和通信复杂性。具体地说,我们的算法获得了 $\tide==3=3x(我们学习SAS级) 的学习效率测试。