Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server. Unlike in conventional ensemble learning, in FL the ensemble can be trained on clients' highly heterogeneous data. Cognizant of this property, Fed-ET uses a weighted consensus distillation scheme with diversity regularization that efficiently extracts reliable consensus from the ensemble while improving generalization by exploiting the diversity within the ensemble. We show the generalization bound for the ensemble of weighted models trained on heterogeneous datasets that supports the intuition of Fed-ET. Our experiments on image and language tasks show that Fed-ET significantly outperforms other state-of-the-art FL algorithms with fewer communicated parameters, and is also robust against high data-heterogeneity.
翻译:联邦学习(FL) 使边际设备能够合作学习模型,而没有向中央综合服务器披露其私人数据。大多数现有的FL算法要求将相同结构的模型在客户和服务器之间部署,使得由于客户的系统资源有限,无法培训大型模型。在这项工作中,我们提出了名为Fed-ET的新颖的混合知识转移方法,其中小模型(在结构上不同)对客户进行培训,并用于在服务器上培训更大的模型。不同于传统的混合学习,在FL中,合用词可以就客户的高度多元性数据进行培训。认识到这一属性,Fed-ET使用一个具有多样性规范化的加权共识蒸馏方案,有效地从组合中获取可靠的共识,同时通过利用组合内的多样性改进总体化。我们展示了在支持Fed-ET直觉的多种数据集上培训过的加权模型的通用。我们在图像和语言任务方面的实验显示,Fed-ET大大地超越了其他州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-