Although federated learning has made awe-inspiring advances, most studies have assumed that the client's data are fully labeled. However, in a real-world scenario, every client may have a significant amount of unlabeled instances. Among the various approaches to utilizing unlabeled data, a federated active learning framework has emerged as a promising solution. In the decentralized setting, there are two types of available query selector models, namely 'global' and 'local-only' models, but little literature discusses their performance dominance and its causes. In this work, we first demonstrate that the superiority of two selector models depends on the global and local inter-class diversity. Furthermore, we observe that the global and local-only models are the keys to resolving the imbalance of each side. Based on our findings, we propose LoGo, a FAL sampling strategy robust to varying local heterogeneity levels and global imbalance ratio, that integrates both models by two steps of active selection scheme. LoGo consistently outperforms six active learning strategies in the total number of 38 experimental settings.
翻译:尽管联邦学习取得了令人瞩目的进展,但大多数研究都假定客户端的数据是完全标记的。然而,在实际场景中,每个客户端可能有大量未标记的实例。在利用未标记数据的各种方法中,联邦主动学习框架已成为一种有前途的解决方案。在分散式的环境中,有两种可用的查询选择器模型,即“全局(global)”和“仅本地(local-only)”模型,但鲜有文献讨论它们的性能优劣及其原因。在这项工作中,我们首先表明了两种选择器模型的优劣取决于全局和本地的类间差异。此外,我们观察到全局和仅本地模型是解决每个方面不平衡的关键。基于我们的发现,我们提出了一种名为LoGo的采样策略,它是一个针对不同的本地异质性水平和全局不平衡率的 FAL 方案,通过两步主动选择方案集成了这两种模型。 LoGo 在 38 种实验设置的总数中始终优于六种主动学习策略。