Federated learning enables a large amount of edge computing devices to learn a centralized model while keeping all local data on edge devices. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees in the federated setting. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data. We investigate the effect of different sampling and averaging schemes, which are crucial especially when data are unbalanced. We prove a concise convergence rate of $\mathcal{O}(\frac{1}{T})$ for \texttt{FedAvg} with proper sampling and averaging schemes in convex problems, where $T$ is the total number of steps. Our results show that heterogeneity of data slows down the convergence, which is intrinsic in the federated setting. Low device participation rate can be achieved without severely harming the optimization process in federated learning. We show that there is a trade-off between communication efficiency and convergence rate. We analyze the necessity of learning rate decay by taking a linear regression as an example. Our work serves as a guideline for algorithm design in applications of federated learning, where heterogeneity and unbalance of data are the common case.
翻译:联邦学习使大量边缘计算设备能够学习集中模型, 同时保存在边缘设备上的所有本地数据。 作为在这一背景下的主要算法, Federate Average (\ textt{FedAvg}) 使用小组别平行运行Stochatic Gradient Fround (SGD), 其总设备中有一小部分, 平均序列只偶尔一次。 尽管它简单, 但它在联邦环境中缺乏理论保障 。 在本文中, 我们分析 \ textt{FedAvg} 与非二元数据之间的趋同。 我们调查了不同抽样和平均计划的效果, 特别是在数据不平衡的情况下。 我们证明, 美元和平均基系( fedAvg} ) 在总设备中同时运行一个简洁的合并率 $\ mathcaladcal{O} (\\\ fracr{1} (SGD) $), 美元, 和平均法系的递归正率, 我们的计算方法可以分析一个共同的学习率。 我们的学习速度, 学习速度, 通过一个方法, 我们的学习速度, 学习速度可以分析我们学习速度, 学习速度的进度的进度, 学习速度, 的进度, 学习速度, 学习速度可以进行一个分析。