Learning from positive and unlabeled (PU) data is an important problem in various applications. Most of the recent approaches for PU classification assume that the class-prior (the ratio of positive samples) in the training unlabeled dataset is identical to that of the test data, which does not hold in many practical cases. In addition, we usually do not know the class-priors of the training and test data, thus we have no clue on how to train a classifier without them. To address these problems, we propose a novel PU classification method based on density ratio estimation. A notable advantage of our proposed method is that it does not require the class-priors in the training phase; class-prior shift is incorporated only in the test phase. We theoretically justify our proposed method and experimentally demonstrate its effectiveness.
翻译:从正和无标签(PU)数据中学习,是各种应用中的一个重要问题。最近的PU分类方法大多认为,培训无标签数据集中的类优先(正样本比例)与测试数据相同,在许多实际情况下并不存在。此外,我们通常不了解培训和测试数据中的类优先,因此我们不知道如何培训没有培训和无标签(PU)数据的分类员。为了解决这些问题,我们提出了基于密度比率估计的新颖的类优先分类方法。我们拟议方法的一个显著优点是,它不需要在培训阶段的类优先;类优先的转换只是在测试阶段才被纳入其中。我们理论上证明我们提出的方法和实验性地证明它的有效性。