Federated Averaging (FedAvg) remains the most popular algorithm for Federated Learning (FL) optimization due to its simple implementation, stateless nature, and privacy guarantees combined with secure aggregation. Recent work has sought to generalize the vanilla averaging in FedAvg to a generalized gradient descent step by treating client updates as pseudo-gradients and using a server step size. While the use of a server step size has been shown to provide performance improvement theoretically, the practical benefit of the server step size has not been seen in most existing works. In this work, we present FedExP, a method to adaptively determine the server step size in FL based on dynamically varying pseudo-gradients throughout the FL process. We begin by considering the overparameterized convex regime, where we reveal an interesting similarity between FedAvg and the Projection Onto Convex Sets (POCS) algorithm. We then show how FedExP can be motivated as a novel extension to the extrapolation mechanism that is used to speed up POCS. Our theoretical analysis later also discusses the implications of FedExP in underparameterized and non-convex settings. Experimental results show that FedExP consistently converges faster than FedAvg and competing baselines on a range of realistic FL datasets.
翻译:FedAvg (FedAvg) 仍然是最受欢迎的联邦学习优化算法(FedAvg ), 因其简单的实施、 无国籍性质和隐私保障, 再加上安全整合, 仍然是最受欢迎的联邦学习优化算法(FL) 。 最近的工作力求将FedAvg 的香草平均值推广到一个普遍的梯度下降步骤, 将客户更新作为伪梯度, 并使用服务器阶梯大小。 虽然在理论上显示使用服务器阶梯大小可以提供性能改进, 但大多数现有工作都没有看到服务器阶梯大小的实际好处。 在这项工作中,我们提出了FedExP, 一种根据动态差异化的假梯度确定FL 整个流程中FL 服务器服务器的服务器阶梯大小的方法。 我们首先考虑过分化的 convex 制度, 我们在此制度下显示FedAvg 和 Convex Sets(PPPOCS) 算法的令人感兴趣的相似之处。 我们然后展示FedExP PedP 在FedP lad Plax 快速化和非Clationalalalal 的基线设置中, 显示FedAClationalalal lax lax lax lax lax acildal lax lax lax atravel lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax) 的理论分析结果的影响。我们后来也讨论了。