Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly. As a leading algorithm in this setting, Federated Average FedAvg, which runs Stochastic Gradient Descent (SGD) in parallel on local devices and averages the sequences only once in a while, have been widely used due to their simplicity and low communication cost. However, despite recent research efforts, it lacks theoretical analysis under assumptions beyond smoothness. In this paper, we analyze the convergence of FedAvg. Different from the existing work, we relax the assumption of strong smoothness. More specifically, we assume the semi-smoothness and semi-Lipschitz properties for the loss function, which have an additional first-order term in assumption definitions. In addition, we also assume bound on the gradient, which is weaker than the commonly used bounded gradient assumption in the convergence analysis scheme. As a solution, this paper provides a theoretical convergence study on Federated Learning.
翻译:联邦学习使大量边际计算设备能够学习一个模型而无需共同分享数据。 作为这一背景下的主要算法,同时在当地装置上同时运行Stochatistic Gradient Emplement(SGD)的联邦平均 FedAvg(FedAvg) 和 平均顺序(SGD) 在当地装置上同时运行,偶尔只使用过一次, 由于其简单和通信成本低而被广泛使用。 然而,尽管最近的研究努力, 它在光滑的假设下缺乏理论分析。 在本文中,我们分析了FedAvg的趋同。 与现有工作不同, 我们放松了对强力平稳的假设。 更具体地说, 我们假设损失功能的半偏差和半利普西茨特性是额外的第一级定义。 此外,我们还承担了梯度, 梯度比趋同分析计划中通常使用的受约束的梯度假设更弱。 作为一种解决办法, 本文提供了对联邦学习的理论趋同研究。