Federated Learning allows training machine learning models by using the computation and private data resources of many distributed clients such as smartphones and IoT devices. Most existing works on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data, e.g., due to a lack of expertise. This work considers a server that hosts a labeled dataset and wishes to leverage clients with unlabeled data for supervised learning. We propose a new Federated Learning framework referred to as SemiFL to address Semi-Supervised Federated Learning (SSFL). In SemiFL, clients have completely unlabeled data, while the server has a small amount of labeled data. SemiFL is communication efficient since it separates the training of server-side supervised data and client-side unsupervised data. We demonstrate several strategies of SemiFL that enhance efficiency and prediction and develop intuitions of why they work. In particular, we provide a theoretical understanding of the use of strong data augmentation for Semi-Supervised Learning (SSL), which can be interesting in its own right. Extensive empirical evaluations demonstrate that our communication efficient method can significantly improve the performance of a labeled server with unlabeled clients. Moreover, we demonstrate that SemiFL can outperform many existing FL results trained with fully supervised data, and perform competitively with the state-of-the-art centralized SSL methods. For instance, in standard communication efficient scenarios, our method can perform $93\%$ accuracy on the CIFAR10 dataset with only $4000$ labeled samples at the server. Such accuracy is only $2\%$ away from the result trained from $50000$ fully labeled data, and it improves about $30\%$ upon existing SSFL methods in the communication efficient setting.
翻译:联邦学习联合会(FL)的多数现有工作都假定客户有地面真实性标签。然而,在许多实际情况下,由于缺少专业知识,客户可能无法对特定任务数据进行标签标签,例如,由于缺乏专业知识,因此,联邦学习联合会(FL)无法将特定任务数据贴上标签。这项工作考虑到一个服务器,该服务器将存放一个标签数据集,并希望利用未贴标签的数据来进行监管学习。我们提议一个新的联邦学习框架,称为SemFL,用于处理半超额联邦学习(SSFL)的准确性。在SemFL中,客户完全没有标签数据,而服务器则只有少量的标签数据。 IMFL(FL)的高效性能只能通过理论理解在半超额联邦学习(SSFL)中使用的强数据增强性数据。 在SFL3中,客户完全可以展示我们经过培训的SLFL(FL)的标准化数据,在内部数据中可以展示我们现有的高效性数据。