Federated Distillation (FD) is a popular novel algorithmic paradigm for Federated Learning, which achieves training performance competitive to prior parameter averaging based methods, while additionally allowing the clients to train different model architectures, by distilling the client predictions on an unlabeled auxiliary set of data into a student model. In this work we propose FedAUX, an extension to FD, which, under the same set of assumptions, drastically improves performance by deriving maximum utility from the unlabeled auxiliary data. FedAUX modifies the FD training procedure in two ways: First, unsupervised pre-training on the auxiliary data is performed to find a model initialization for the distributed training. Second, $(\varepsilon, \delta)$-differentially private certainty scoring is used to weight the ensemble predictions on the auxiliary data according to the certainty of each client model. Experiments on large-scale convolutional neural networks and transformer models demonstrate, that the training performance of FedAUX exceeds SOTA FL baseline methods by a substantial margin in both the iid and non-iid regime, further closing the gap to centralized training performance. Code is available at github.com/fedl-repo/fedaux.
翻译:联邦蒸馏法(FD)是联邦学习的流行的新式算法范例,它通过将客户对未贴标签的辅助数据集的预测提炼成学生模型,使培训业绩具有竞争力,达到以前以平均法为基础的参数,同时允许客户对不同的模型结构进行培训,将客户对未贴标签的辅助数据集的预测提炼成学生模型。在这项工作中,我们提议FedAUX(FedAUX)作为FD的延伸,根据同一套假设,从无标签的辅助数据中获取最大效用,大大提高了业绩。FedAUX对FD培训程序的修改有两种方式:第一,对辅助数据进行未经监督的预培训,以找到分配培训的初始化模型。第二,用$(\ varepsilon,\delta) 美元(data) 私人不同程度的确定性评分,以根据每个客户模型的确定性来权衡辅助数据的共同预测。对大规模革命神经网络和变压模型的实验表明,FADAX的培训业绩大大超过SATA FL基线方法,在iqual-commodeal-deal-deal demodeal-dection-deal-deal-demodection-deal-deal-demodeal-dection-dection-dection-dection-dection-demodection-dection-deal-dection-dection-