Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set, between the central server and the participating clients. While for conventional Federated Learning algorithms, like Federated Averaging (FA), communication scales with the size of the jointly trained model, in FD communication scales with the distillation data set size, resulting in advantageous communication properties, especially when large models are trained. In this work, we investigate FD from the perspective of communication efficiency by analyzing the effects of active distillation-data curation, soft-label quantization and delta-coding techniques. Based on the insights gathered from this analysis, we present Compressed Federated Distillation (CFD), an efficient Federated Distillation method. Extensive experiments on Federated image classification and language modeling problems demonstrate that our method can reduce the amount of communication necessary to achieve fixed performance targets by more than two orders of magnitude, when compared to FD and by more than four orders of magnitude when compared with FA.
翻译:通信限制是妨碍广泛采用联邦学习系统的主要挑战之一。最近,出现了联邦蒸馏(FD),这是联邦学习的一个新的算法范例,具有截然不同的通信特性;FD方法利用混合蒸馏技术和交换模型产出,在中央服务器和参与客户之间作为未贴标签的公共数据集的软标签提出;对于传统的联邦学习算法,如联邦蒸馏(FA),在具有蒸馏数据集规模的FD通信比额表中采用联合培训模式的通信尺度,导致有利的通信特性,特别是在培训大型模型时。在这项工作中,我们从通信效率的角度来调查FD,分析积极蒸馏-数据调节、软标签分解和三角编码技术的效果。根据从这一分析中收集的深入了解,我们提出了压缩联邦蒸馏(CFD),这是一种高效的联邦蒸馏方法。关于联邦图像分类和语言建模问题的广泛实验表明,在比固定目标更大规模时,我们的方法可以减少通信量,在比固定订单更大规模的情况下,比FDD要减少4级的通信量。