Positive-Unlabeled (PU) learning tries to learn binary classifiers from a few labeled positive examples with many unlabeled ones. Compared with ordinary semi-supervised learning, this task is much more challenging due to the absence of any known negative labels. While existing cost-sensitive-based methods have achieved state-of-the-art performances, they explicitly minimize the risk of classifying unlabeled data as negative samples, which might result in a negative-prediction preference of the classifier. To alleviate this issue, we resort to a label distribution perspective for PU learning in this paper. Noticing that the label distribution of unlabeled data is fixed when the class prior is known, it can be naturally used as learning supervision for the model. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions, which is formulated by aligning their expectations. Moreover, we further adopt the entropy minimization and Mixup regularization to avoid the trivial solution of the label distribution consistency on unlabeled data and mitigate the consequent confirmation bias. Experiments on three benchmark datasets validate the effectiveness of the proposed method.Code available at: https://github.com/Ray-rui/Dist-PU-Positive-Unlabeled-Learning-from-a-Label-Distribution-Perspective.
翻译:未加标签的正( PU) 学习尝试从几个标签的正面例子中学习二进制分类器, 并有许多未加标签的正面例子。 与普通的半监督学习相比, 由于缺乏已知的负面标签, 这项任务更具挑战性。 虽然现有的基于成本敏感且基于成本的方法已经达到了最新水平的性能, 但是它们明确将未标签的数据归类为负面样本的风险最小化, 这可能导致分类者的负面偏好。 为了缓解这一问题, 我们使用标签分配观点来学习本文中的 PU。 与普通的半监督学习相比, 与普通的半监督学习学习相比, 这项任务更具有挑战性。 虽然现有的基于成本敏感且基于成本的方法已经实现了最新水平的性能表现。 此外, 我们进一步采用昆虫最小化和混合规范, 以避免未加标签的数据的标签分配一致性这一微不足道的解决方案, 并减少随后的确认偏差。 在三个基准数据基点上进行实验, 可以用来作为模型的学习监督。 我们为此提议在预测和地面标签分布之间寻求标签的一致性。