Federated learning (FL) has emerged as an effective technique to co-training machine learning models without actually sharing data and leaking privacy. However, most existing FL methods focus on the supervised setting and ignore the utilization of unlabeled data. Although there are a few existing studies trying to incorporate unlabeled data into FL, they all fail to maintain performance guarantees or generalization ability in various real-world settings. In this paper, we focus on designing a general framework FedSiam to tackle different scenarios of federated semi-supervised learning, including four settings in the labels-at-client scenario and two setting in the labels-at-server scenario. FedSiam is built upon a siamese network into FL with a momentum update to handle the non-IID challenges introduced by unlabeled data. We further propose a new metric to measure the divergence of local model layers within the siamese network. Based on the divergence, FedSiam can automatically select layer-level parameters to be uploaded to the server in an adaptive manner. Experimental results on three datasets under two scenarios with different data distribution settings demonstrate that the proposed FedSiam framework outperforms state-of-the-art baselines.
翻译:联邦学习(FL)已成为一项有效的方法,用于在不实际分享数据和泄露隐私的情况下共同培训机器学习模式,而将之作为共同培训机器学习模式的有效方法,然而,大多数现有的FL方法侧重于受监督的设置,忽视未贴标签数据的使用。虽然有一些现有的研究试图将未贴标签的数据纳入FL,但它们都未能在各种现实世界环境中保持性能保障或概括化能力。在本文件中,我们侧重于设计一个通用框架FedSiam,以应对联邦半监督学习的不同情景,包括标签客户情景中的4个设置和标签服务器情景中的2个设置。FedSiam建在一个Siamees网络上,以动态更新的方式处理未贴标签数据带来的非IID挑战。我们进一步提出一个新的衡量Siamees网络内地方模型层差异的衡量标准。基于差异,FedSiam可以自动选择以适应方式上传到服务器的层级参数。在两种情景下的3个数据设置上实验结果,同时提供不同的数据分配基准框架。