Positive-unlabeled (PU) learning aims at learning a binary classifier from only positive and unlabeled training data. Recent approaches addressed this problem via cost-sensitive learning by developing unbiased loss functions, and their performance was later improved by iterative pseudo-labeling solutions. However, such two-step procedures are vulnerable to incorrectly estimated pseudo-labels, as errors are propagated in later iterations when a new model is trained on erroneous predictions. To prevent such confirmation bias, we propose PUUPL, a novel loss-agnostic training procedure for PU learning that incorporates epistemic uncertainty in pseudo-label selection. By using an ensemble of neural networks and assigning pseudo-labels based on low-uncertainty predictions, we show that PUUPL improves the reliability of pseudo-labels, increasing the predictive performance of our method and leading to new state-of-the-art results in self-training for PU learning. With extensive experiments, we show the effectiveness of our method over different datasets, modalities, and learning tasks, as well as improved calibration, robustness over prior misspecifications, biased positive data, and imbalanced datasets.
翻译:积极标签(PU)学习的目的是从正面和未贴标签的培训数据中学习二进制分类器。最近的方法通过开发无偏向损失功能,通过成本敏感的学习,通过开发无偏向损失功能来解决这一问题,其性能后来通过迭代假标签解决方案得到了改进。然而,这种两步程序很容易被错误估计的假标签所误估,因为在对新模型进行错误预测培训时,错误会在以后的迭代中传播。为了防止这种确认偏差,我们提议了PUUPL,这是用于PU学习的新颖的失记性培训程序,它包含了假标签选择中的缩影不确定性。我们通过使用神经网络的组合和根据低不确定性预测分配假标签,表明PUUPL提高了伪标签的可靠性,提高了我们方法的预测性能,并导致在对PU学习进行自我培训方面出现新的最新结果。通过广泛的实验,我们展示了我们的方法相对于不同数据集、模式和学习任务的有效性,以及改进的校准性、稳健性比先前的不平衡性数据。