Positive-unlabeled (PU) learning trains a binary classifier using only positive and unlabeled data. A common simplifying assumption is that the positive data is representative of the target positive class. This assumption is often violated in practice due to time variation, domain shift, or adversarial concept drift. This paper shows that PU learning is possible even with arbitrarily non-representative positive data when provided unlabeled datasets from the source and target distributions. Our key insight is that only the negative class's distribution need be fixed. We propose two methods to learn under such arbitrary positive bias. The first couples negative-unlabeled (NU) learning with unlabeled-unlabeled (UU) learning while the other uses a novel recursive risk estimator robust to positive shift. Experimental results demonstrate our methods' effectiveness across numerous real-world datasets and forms of positive data bias, including disjoint positive class-conditional supports.
翻译:仅使用正值和无标签数据的正值( PU) 学习训练二进制分类器使用正值和无标签数据。 一个共同的简化假设是,正值数据代表目标正值类。 由于时间变化、域变换或对抗性概念漂移,这一假设在实践中经常被违反。 本文显示,即使从源和目标分布中提供无标签的数据集而任意地不具代表性的正值数据, 也有可能进行 PU 学习。 我们的关键洞察力是, 只需要固定负值类的分布。 我们建议了两种方法, 在这种任意的正值偏差下学习。 第一批夫妇以无标签的负值( NU) 学习, 而其他夫妇则使用新颖的递归风险估计器进行积极变换。 实验结果展示了我们的方法在众多真实世界数据集和正值数据偏差形式中的有效性, 包括不连带积极的类别支持 。