Federated Semi-supervised Learning (FSSL) combines techniques from both fields of federated and semi-supervised learning to improve the accuracy and performance of models in a distributed environment by using a small fraction of labeled data and a large amount of unlabeled data. Without the need to centralize all data in one place for training, it collect updates of model training after devices train models at local, and thus can protect the privacy of user data. However, during the federal training process, some of the devices fail to collect enough data for local training, while new devices will be included to the group training. This leads to an unbalanced global data distribution and thus affect the performance of the global model training. Most of the current research is focusing on class imbalance with a fixed number of classes, while little attention is paid to data imbalance with a variable number of classes. Therefore, in this paper, we propose Federated Semi-supervised Learning for Class Variable Imbalance (FCVI) to solve class variable imbalance. The class-variable learning algorithm is used to mitigate the data imbalance due to changes of the number of classes. Our scheme is proved to be significantly better than baseline methods, while maintaining client privacy.
翻译:应对联邦半监督学习中类别变量不平衡问题
Federated Semi-supervised Learning(FSSL)结合了联邦学习和半监督学习的技术,通过使用少量的标记数据和大量的未标记数据在分布式环境中提高模型的准确性和性能。在无需将所有数据集中到一个地方进行训练的情况下,收集设备本地训练模型的更新,从而可以保护用户数据隐私。然而,在联邦训练过程中,一些设备未能收集足够的数据进行本地训练,同时新设备将被包括在组内进行训练。这导致了不平衡的全局数据分布,从而影响了全局模型训练的性能。目前大多数研究都集中在固定类别数的类别不平衡上,而对于变化的类别数的数据不平衡,很少引起关注。因此,在本文中,我们提出了一种用于解决类别变量不平衡的联邦半监督学习方案(FCVI)。使用类变量学习算法来减轻由于类别数变化而导致的数据不平衡。我们的方案被证明显著优于基线方法,同时保持客户数据隐私。