Federated and decentralized machine learning leverage end-user devices for privacy-preserving training of models at lower operating costs than within a data center. In a round of Federated Learning (FL), a random sample of participants trains locally, then a central server aggregates the local models to produce a single model for the next round. In a round of Decentralized Learning (DL), all participants train locally and then aggregate with their immediate neighbors, resulting in many local models with residual variance between them. On the one hand, FL's sampling and lower model variance provides lower communication costs and faster convergence. On the other hand, DL removes the need for a central server and distributes the communication costs more evenly amongst nodes, albeit at a larger total communication cost and slower convergence. In this paper, we present MoDeST: Mostly-Consistent Decentralized Sampling Training. MoDeST implements decentralized sampling in which a random subset of nodes is responsible for training and aggregation every round: this provides the benefits of both FL and DL without their traditional drawbacks. Our evaluation of MoDeST on four common learning tasks: (i) confirms convergence as fast as FL, (ii) shows a 3x-14x reduction in communication costs compared to DL, and (iii) demonstrates that MoDeST quickly adapts to nodes joining, leaving, or failing, even when 80% of all nodes become unresponsive.
翻译:在Federal Learking(FL)中,一个随机抽样的参与者在当地培训,然后由一个中央服务器汇总当地模型,为下一回合制作单一模型。在一轮分散学习(DL)中,所有参与者都在当地培训,然后与近邻一起汇总,结果产生了许多本地模型,其间存在剩余差异。一方面,FL的抽样和较低的模型差异提供了较低的通信成本和更快的趋同。另一方面,DL在四个共同学习任务中取消了对中央服务器的需求,并将通信成本分配得更加均衡,尽管通信成本总额较大,趋同速度较慢。在本论文中,我们介绍MDEST:最一致的分散采样培训。MoDEST实施分散抽样,其中随机的一组节点负责每回合的培训和汇总:这提供了FL-14和DL的效益,而没有传统的回溯。我们在四个共同学习任务中对MDEST的估值是:(i)在快速的递减时,MST将所有80和FL的成本显示为快速的递减,(I)在快速的递减时,直到Fx的递减时将显示所有的FL成本。</s>