Given only positive (P) and unlabeled (U) data, PU learning can train a binary classifier without any negative data. It has two building blocks: PU class-prior estimation (CPE) and PU classification; the latter has been well studied while the former has received less attention. Hitherto, the distributional-assumption-free CPE methods rely on a critical assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution. If this is violated, those CPE methods will systematically overestimate the class prior; it is even worse that we cannot verify the assumption based on the data. In this paper, we rethink CPE for PU learning-can we remove the assumption to make CPE always valid? We show an affirmative answer by proposing Regrouping CPE (ReCPE) that builds an auxiliary probability distribution such that the support of the positive data distribution is never contained in the support of the negative data distribution. ReCPE can work with any CPE method by treating it as the base method. Theoretically, ReCPE does not affect its base if the assumption already holds for the original probability distribution; otherwise, it reduces the positive bias of its base. Empirically, ReCPE improves all state-of-the-art CPE methods on various datasets, implying that the assumption has indeed been violated here.
翻译:鉴于数据为正(P)和无标签(U)数据,PU学习可以培训二进制分类器,而没有任何负数据。它有两个构件:PU类类前估值(CPE)和PU分类;后者在前者未受到足够重视的情况下已经研究过;后者在前者未受到足够重视的情况下得到了很好地研究;因此,分配-消费-免费CPE方法依赖于一个关键假设,即支持正数据分布不能包含在支持负数据分布中;如果违反这一假设,CPE方法将系统地高估先前的分类;更糟糕的是,我们不能根据数据核实假设。在本文件中,我们重新思考CPE的CPE学习-能否取消假设,使CPE永远有效?我们通过建议重新组合CPE(RE),建立辅助性概率分布,使对正数据分布的支持从来无法包含在支持负数据分布中。RECPE的方法如果将CE方法作为基础方法处理,那么RECPE不会影响其基础基础,如果我们认为CPE的CP-C-CP学习-C-C-legn the basure to basision to the basion of the repreply prestial prestical prestitional prest prestal prestitions prestals prestals prestical prestimeals)。