Federated learning enables distributed devices to collaboratively learn a shared prediction model without centralizing on-device training data. Most of the current algorithms require comparable individual efforts to train on-device models with the same structure and size, impeding participation from resource-constrained devices. Given the widespread yet heterogeneous devices nowadays, this paper proposes a new framework supporting federated learning across heterogeneous on-device models via Zero-shot Knowledge Transfer, named by FedZKT. Specifically, FedZKT allows participating devices to independently determine their on-device models. To transfer knowledge across on-device models, FedZKT develops a zero-shot distillation approach contrary to certain prior research based on a public dataset or a pre-trained data generator. To utmostly reduce on-device workload, the resource-intensive distillation task is assigned to the server, which constructs a generator to adversarially train with the ensemble of the received heterogeneous on-device models. The distilled central knowledge will then be sent back in the form of the corresponding on-device model parameters, which can be easily absorbed at the device side. Experimental studies demonstrate the effectiveness and the robustness of FedZKT towards heterogeneous on-device models and challenging federated learning scenarios, such as non-iid data distribution and straggler effects.
翻译:联邦学习使分布式设备能够在不集中设备培训数据的情况下合作学习共享的预测模型。大多数目前的算法要求有可比的个别努力来培训结构与规模相同的设备模型,这阻碍了资源限制装置的参与。鉴于目前广泛而多样化的设备,本文件提议了一个新的框架,支持通过零射知识传输(FedZKT命名)在各种设计模型上学习。具体地说,FedZKT允许参与设备独立确定其在设计模型上的模式。为了在设计模型上转让知识,FDZKT开发了一种零光蒸馏方法,这与以前在公共数据集或预先培训的数据生成器基础上进行的某些研究背道而驰。为了最大限度地减少在设计性工作量,资源密集型蒸馏任务被分配给服务器,该服务器将建造一台发电机,用于用所接收的模型的组合式模型来进行对抗性培训。 精选的中央知识将以相应的配置式模型参数的形式发送回去,FDZKT开发零光蒸馏法方法,这与以前在公共数据集集或预先培训的数据生成模型方面可以很容易吸收,实验性地展示FD-D-K模型的不具有挑战性效果。