Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance. In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model. The distilled data look like noise and are only useful to the specific model weights, i.e., become useless after the model updates. With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg while preserving between 93% to 99% performance of a centralized counterpart. Afterwards, clients could switch to traditional methods such as FedAvg to finetune the last few percent to fit personalized local models with local datasets. Through comprehensive experiments, we show the accuracy and communication performance of DOSFL on both vision and language tasks with different models including CNN, LSTM, Transformer, etc. We demonstrate that an eavesdropping attacker cannot properly train a good model using the leaked distilled data, without knowing the initial model weights. DOSFL serves as an inexpensive method to quickly converge on a performant pre-trained model with less than 0.1% communication cost of traditional methods.
翻译:在理想条件下,当数据分布不善时,目前的联邦学习算法会采取数十个通信周期,在理想条件下传播不通的模型重量;当数据分布不善时,则采取数以百计的通信周期。受最近关于数据集蒸馏法的工作启发,并分发一张照片的学习,我们提议蒸馏一流联邦学习(DOSFL),以大幅降低通信成本,同时实现可比的绩效。在仅仅一回合中,每个客户蒸馏其私人数据集,向服务器发送合成数据(如图像或句子),并集体培训一个全球模型。蒸馏数据看起来像噪音,而且仅对特定的模型重量有用,即,在模型更新后变得毫无用处。有了这种无重量和梯度的设计,DOSFLFL的通信总成本将比FedAvg少3个数量级,同时将中央对应方的93%至99%的性能保留在99%之间。随后,客户可以转向FedAvg模型等传统方法,将最后几个百分点快速调整本地模型与本地数据集相匹配。通过全面实验,我们展示了DOSFLFL的精确度和通信模型的精确度。我们展示了模型的精确度和通信性能模型,同时展示了DSLFLFLFL的精确度,在S的模型上,我们用一个精度模型展示了一种精度模型和语言的精度。