In this paper, we study a one-shot distributed learning algorithm via refitting Bootstrap samples, which we refer to as ReBoot. Given the local models that are fit on multiple independent subsamples, ReBoot refits a new model on the union of the Bootstrap samples drawn from these local models. The whole procedure requires only one round of communication of model parameters. Theoretically, we analyze the statistical rate of ReBoot for generalized linear models (GLM) and noisy phase retrieval, which represent convex and non-convex problems respectively. In both cases, ReBoot provably achieves the full-sample statistical rate whenever the subsample size is not too small. In particular, we show that the systematic bias of ReBoot, the error that is independent of the number of subsamples, is $O(n ^ {-2})$ in GLM, where n is the subsample size. This rate is sharper than that of model parameter averaging and its variants, implying the higher tolerance of ReBoot with respect to data splits to maintain the full-sample rate. Simulation study exhibits the statistical advantage of ReBoot over competing methods including averaging and CSL (Communication-efficient Surrogate Likelihood) with up to two rounds of gradient communication. Finally, we propose FedReBoot, an iterative version of ReBoot, to aggregate convolutional neural networks for image classification, which exhibits substantial superiority over FedAve within early rounds of communication.
翻译:在本文中,我们通过重新装配布景样本(我们称之为ReBoot)来研究一个一次性的分布式学习算法,我们称之为ReBoot。考虑到适合多个独立子样本的本地模型,ReBooot重新设计了一个关于从这些本地模型中提取的布景样本结合的新模型。整个程序只需要一回合模型参数的交流。理论上,我们分析了通用线性模型(GLM)和噪音阶段检索的ReBooot统计率,这分别代表了螺旋和非convex问题。在这两种情况中,ReBooot都非常适合在次样本规模不小时实现全模版统计率。特别是,我们显示ReBoot系统性偏差,这个与子样本数目无关的错误是$(n {{-2}),GLM的ReBoot值统计率,这个比率比模型的分类标准平均值及其变式更清晰,意味着ReBooot在数据序列中更能容忍全比例的全数值。