Bottlenecks of binary classification from positive and unlabeled data (PU classification) are the requirements that given unlabeled patterns are drawn from the test marginal distribution, and the penalty of the false positive error is identical to the false negative error. However, such requirements are often not fulfilled in practice. In this paper, we generalize PU classification to the class prior shift and asymmetric error scenarios. Based on the analysis of the Bayes optimal classifier, we show that given a test class prior, PU classification under class prior shift is equivalent to PU classification with asymmetric error. Then, we propose two different frameworks to handle these problems, namely, a risk minimization framework and density ratio estimation framework. Finally, we demonstrate the effectiveness of the proposed frameworks and compare both frameworks through experiments using benchmark datasets.
翻译:从正值和未贴标签数据(PU分类)中得出的二进制分类的瓶颈要求是,鉴于未贴标签的模式是从测试的边际分布中得出的,对假正差的处罚与虚假负差相同,但是,这些要求在实践中往往没有得到满足。在本文中,我们将PU分类概括为先前轮值和不对称误差假设的类别。根据对Bayes最佳分类器的分析,我们显示,在测试类别之前,前轮值下的PU分类等同于非对称错误的PU分类。然后,我们提出了两个不同的框架来处理这些问题,即风险最小化框架和密度比率估计框架。最后,我们展示了拟议框架的有效性,并通过使用基准数据集进行实验来比较这两个框架。