Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure $G$ exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We encode the distribution heterogeneity by parametrizing distributions on local devices with a set of distinct $p$-dimensional vectors. We then propose to jointly estimate parameters of all devices under the $M$-estimation framework with the fused Lasso regularization, encouraging an equal estimate of parameters on connected devices in $G$. We provide a general result for our estimator depending on $G$, which can be further calibrated to obtain convergence rates for various specific problem setups. Surprisingly, our estimator attains the optimal rate under certain graph fidelity condition on $G$, as if we could aggregate all samples sharing the same distribution. If the graph fidelity condition is not met, we propose an edge selection procedure via multiple testing to ensure the optimality. To ease the burden of local computation, a decentralized stochastic version of ADMM is provided, with convergence rate $O(T^{-1}\log T)$ where $T$ denotes the number of iterations. We highlight that, our algorithm transmits only parameters along edges of $G$ at each iteration, without requiring a central machine, which preserves privacy. We further extend it to the case where devices are randomly inaccessible during the training process, with a similar algorithmic convergence guarantee. The computational and statistical efficiency of our method is evidenced by simulation experiments and the 2020 US presidential election data set.
翻译:在分布式机器学习实践中,对多种分散装置进行算法培训,但不分享当地数据,因此在分布式机器学习实践中日益普及。通常,当地通信设备后面有一个图形结构$G美元。在这项工作中,我们考虑在数据分布和通信差异以及本地设备有限计算能力等联合学习中进行参数估计。我们用一套不同的美元-维矢量的图形对本地设备分配进行分解,从而将分布差异性异质编码。我们然后提议在分布式机器学习过程中联合估算所有设备在美元-美元-维矢量框架下的边际参数。通常,在使用Lasso正规化的组合框架下,鼓励对连接装置的参数进行等值估计。我们考虑在使用数据分布式数据分布式计算时,我们提出一个总的结果,我们通过多种G$-美元-美元计算法的精度计算方法,确保各种特定问题集集的趋精度率。令人惊讶的是,我们只能以美元-G$-维度计算所有共享分布式分布式的样本。如果不满足这个条件,我们建议以2020年的正正正正正正正正正值标准,我们建议通过多的离值计算方法来进行边级的递化的递化的计算, 。