联邦开放世界半监督学习中的无偏训练 (Towards Unbiased Training in Federated Open-world Semi-supervised Learning)

Federated Semi-supervised Learning (FedSSL) has emerged as a new paradigm for allowing distributed clients to collaboratively train a machine learning model over scarce labeled data and abundant unlabeled data. However, existing works for FedSSL rely on a closed-world assumption that all local training data and global testing data are from seen classes observed in the labeled dataset. It is crucial to go one step further: adapting FL models to an open-world setting, where unseen classes exist in the unlabeled data. In this paper, we propose a novel Federatedopen-world Semi-Supervised Learning (FedoSSL) framework, which can solve the key challenge in distributed and open-world settings, i.e., the biased training process for heterogeneously distributed unseen classes. Specifically, since the advent of a certain unseen class depends on a client basis, the locally unseen classes (exist in multiple clients) are likely to receive differentiated superior aggregation effects than the globally unseen classes (exist only in one client). We adopt an uncertainty-aware suppressed loss to alleviate the biased training between locally unseen and globally unseen classes. Besides, we enable a calibration module supplementary to the global aggregation to avoid potential conflicting knowledge transfer caused by inconsistent data distribution among different clients. The proposed FedoSSL can be easily adapted to state-of-the-art FL methods, which is also validated via extensive experiments on benchmarks and real-world datasets (CIFAR-10, CIFAR-100 and CINIC-10).

翻译：联邦半监督学习（FedSSL）已成为一种新的范式，允许分布式客户端在稀缺标记数据和丰富未标记数据上协同训练机器学习模型。然而，现有的FedSSL方法依赖于封闭世界假设，即所有本地训练数据和全局测试数据都来自于标记数据集中观察到的已知分类。然而，更进一步的步骤是至关重要的：将FL模型适应于开放世界设置，其中未知分类存在于未标记数据中。在本文中，我们提出了一种新的联邦开放世界半监督学习（FedoSSL）框架，可以解决分布式和开放世界设置中的关键挑战，即异构分布的未知类别的有偏训练过程。具体而言，由于某种未知的类别取决于客户端基础，局部未知类别（存在于多个客户端中）很可能会获得不同的优越聚合效果，而全局未知类别（仅存在于一个客户端中）不会。我们采用一种不确定性感知的抑制损失来减轻局部未知类别和全局未知类别之间的偏差训练。此外，我们还启用了一个校准模块作为全局聚合的补充，以避免由于不同客户端之间的不一致数据分布而导致的潜在冲突知识转移。所提出的FedoSSL可以轻松地适应最先进的FL方法，并通过对基准测试和真实世界数据集（CIFAR-10，CIFAR-100和CINIC-10）的广泛实验进行了验证。