Federated learning (FL) is a burgeoning distributed machine learning framework where a central parameter server (PS) coordinates many local users to train a globally consistent model. Conventional federated learning inevitably relies on a centralized topology with a PS. As a result, it will paralyze once the PS fails. To alleviate such a single point failure, especially on the PS, some existing work has provided decentralized FL (DFL) implementations like CDSGD and D-PSGD to facilitate FL in a decentralized topology. However, there are still some problems with these methods, e.g., significant divergence between users' final models in CDSGD and a network-wide model average necessity in D-PSGD. In order to solve these deficiency, this paper devises a new DFL implementation coined as DACFL, where each user trains its model using its own training data and exchanges the intermediate models with its neighbors through a symmetric and doubly stochastic matrix. The DACFL treats the progress of each user's local training as a discrete-time process and employs a first order dynamic average consensus (FODAC) method to track the \textit{average model} in the absence of the PS. In this paper, we also provide a theoretical convergence analysis of DACFL on the premise of i.i.d data to strengthen its rationality. The experimental results on MNIST, Fashion-MNIST and CIFAR-10 validate the feasibility of our solution in both time-invariant and time-varying network topologies, and declare that DACFL outperforms D-PSGD and CDSGD in most cases.
翻译:联邦学习(FL)是一个快速的分布式机器学习框架,中央参数服务器(PS)在其中协调许多当地用户,以培训全球一致的模式。常规联邦学习不可避免地依赖于一个中央化的PS系统。因此,一旦PS失败,它就会瘫痪。为了缓解这种单一点的失败,特别是在PS上,一些现有工作提供了分散的FL(DFL)执行,如CDSGD和DPSGD, 在一个分散的地形上便利FL。然而,这些方法仍然存在一些问题,例如CDSGD用户的最终模型与D-PSGD的全网络平均模型之间差异很大。为了解决这些缺陷,本文设计了新的DFLFD实施, 每一个用户都利用自己的培训数据进行模型培训并与邻居交流中间模型,通过一个分级和分级的随机矩阵矩阵,将每个用户的当地培训的进展视为一个离散的时间过程,并且将我们FLFA-MFA的模型和FLFA的模型化分析结果作为我们FRA的模型中最差的模型和FLDR的理论分析。