Positive Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled data, which is utilized in many real-world scenarios. However, existing PU learning algorithms cannot deal with the real-world challenge in an open and changing scenario, where examples from unobserved augmented classes may emerge in the testing phase. In this paper, we propose an unbiased risk estimator for PU learning with Augmented Classes (PUAC) by utilizing unlabeled data from the augmented classes distribution, which can be easily collected in many real-world scenarios. Besides, we derive the estimation error bound for the proposed estimator, which provides a theoretical guarantee for its convergence to the optimal solution. Experiments on multiple realistic datasets demonstrate the effectiveness of proposed approach.
翻译:积极的无标签( PU) 学习旨在从只用于许多现实世界情景的正类和无标签( PU) 数据中学习二进制分类器。 但是,现有的 PU 学习算法无法在开放和变化的情景中应对真实世界的挑战,在这种情景中,在测试阶段可能会出现来自未观测的扩展类的例子。 在本文中,我们建议使用来自扩大的分类分布的无标签数据( PUAC ) 来为 PU 学习提供无偏见的风险估计符。 这些数据可以很容易地在许多现实世界情景中收集。 此外,我们推算出对拟议估算仪的估算错误,这为它与最佳解决方案的趋同提供了理论上的保证。 关于多个现实数据集的实验显示了拟议方法的有效性。