Federated Learning allows training machine learning models by using the computation and private data resources of a large number of distributed clients such as smartphones and IoT devices. Most existing works on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data, e.g., due to lack of expertise. In this work, we consider a server that hosts a labeled dataset, and wishes to leverage clients with unlabeled data for supervised learning. We propose a new Federated Learning framework referred to as SemiFL in order to address the problem of Semi-Supervised Federated Learning (SSFL). In SemiFL, clients have completely unlabeled data, while the server has a small amount of labeled data. SemiFL is communication efficient since it separates the training of server-side supervised data and client-side unsupervised data. We demonstrate various efficient strategies of SemiFL that enhance learning performance. Extensive empirical evaluations demonstrate that our communication efficient method can significantly improve the performance of a labeled server with unlabeled clients. Moreover, we demonstrate that SemiFL can outperform many existing FL results trained with fully supervised data, and perform competitively with the state-of-the-art centralized Semi-Supervised Learning (SSL) methods. For instance, in standard communication efficient scenarios, our method can perform 93% accuracy on the CIFAR10 dataset with only 4000 labeled samples at the server. Such accuracy is only 2% away from the result trained from 50000 fully labeled data, and it improves about 30% upon existing SSFL methods in the communication efficient setting.
翻译:联邦学习联合会(FL)的多数现有工作都假定客户有地面实况标签。然而,在许多实际情况下,由于缺乏专业知识,客户可能无法对任务特定数据进行标签标签,例如,由于缺乏专业知识,因此,联邦学习联合会(FL)可以使用大量分布客户的计算和私人数据资源,对机器学习模式进行培训。在这项工作中,我们认为一个服务器可以存放标签数据集,并希望利用无标签数据的客户进行监管学习。我们提出了一个新的联邦学习框架,称为SemFL(SSFL),以解决半超版的联邦学习(SSFL)的问题。在FL(FL)中,客户完全没有标签数据标签,而服务器则只有少量的标签标签标签标签标签标签标签标签数据。SEML(S)由于将服务器监督的数据和客户方面的数据分开,因此通信效率很高。我们展示了SEMFL(S)的各种高效战略,提高学习绩效。广泛的实证评估表明,我们的通信效率方法可以大大改善标签服务器的运行情况,而没有标签客户则使用SFL(SS)(SFL),此外,我们用经过培训的30级(SL)的S-IL(S-L)系统)的升级数据是完全的中央化数据方法,在现有的标准化了。