In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server minimizes an ERM objective function using training data that is stored across $m$ clients. In this setting, the Federated Averaging (FedAve) algorithm is the staple for determining $\epsilon$-approximate solutions to the ERM problem. Similar to standard optimization algorithms, the convergence analysis of FedAve only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the loss function, our method first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients. Then, our method solves the ERM problem at the server using inexact gradient descent. To show that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity. Under some assumptions on the loss function, e.g., strong convexity in parameter, $\eta$-H\"older smoothness in data, etc., we prove that the federated oracle complexity of FedLRGD scales like $\phi m(p/\epsilon)^{\Theta(d/\eta)}$ and that of FedAve scales like $\phi m(p/\epsilon)^{3/4}$ (neglecting sub-dominant factors), where $\phi\gg 1$ is a "communication-to-computation ratio," $p$ is the parameter dimension, and $d$ is the data dimension. Then, we show that when $d$ is small and the loss function is sufficiently smooth in the data, FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a result on low rank approximation of latent variable models.
翻译:在这项工作中,我们研究在联合学习框架内将风险最小化(ERM) 的经验风险最小化(ERM), 中央服务器使用存储于美元客户之间的培训数据,将机构风险管理目标功能最小化。 在这种环境下, FedAve( FedAve) 算法是确定 $\ epsilon$- 近似解决机构风险管理问题的主要方法。 类似于标准的优化算法, FedAve的趋同分析仅仅取决于优化参数中损失函数的平稳性能。 但是, 培训数据中的损失函数通常也非常顺利。 为了利用这种额外的平稳性, 我们提议了“ Federed Now Randir Erder Erder” (FedLGDGD) 算法。 由于数据的平稳性结构在损失函数上大约低级, 我们的方法首先在服务器上学习了多少重量, listal demodemodeal dislational disald( lax) a mocial- disal demodal demodal dislate)。