We study the problem of learning from positive and unlabeled (PU) data in the federated setting, where each client only labels a little part of their dataset due to the limitation of resources and time. Different from the settings in traditional PU learning where the negative class consists of a single class, the negative samples which cannot be identified by a client in the federated setting may come from multiple classes which are unknown to the client. Therefore, existing PU learning methods can be hardly applied in this situation. To address this problem, we propose a novel framework, namely Federated learning with Positive and Unlabeled data (FedPU), to minimize the expected risk of multiple negative classes by leveraging the labeled data in other clients. We theoretically analyze the generalization bound of the proposed FedPU. Empirical experiments show that the FedPU can achieve much better performance than conventional supervised and semi-supervised federated learning methods.
翻译:我们研究了从联盟环境中正值和无标签(PU)数据中学习的问题,在这种环境中,由于资源和时间的限制,每个客户只标出其数据集的一小部分。不同于传统的PU学习环境,即负级由单级组成,在联盟环境中客户无法识别的负面样本可能来自客户所不知道的多个类别。因此,现有的PU学习方法很难适用于这种情况。为了解决这一问题,我们提出了一个新的框架,即用正值和无标签数据进行联合学习(FedPU),以便通过在其他客户中利用标签数据尽量减少预期的多级负面风险。我们从理论上分析了拟议的FPPU的通用约束。经验实验表明,FPPU可以比常规的监管和半监督的联邦化学习方法取得更好的业绩。