Partial-label learning (PLL) is an important weakly supervised learning problem, which allows each training example to have a candidate label set instead of a single ground-truth label. Identification-based methods have been widely explored to tackle label ambiguity issues in PLL, which regard the true label as a latent variable to be identified. However, identifying the true labels accurately and completely remains challenging, causing noise in pseudo labels during model training. In this paper, we propose a new method called CroSel, which leverages historical prediction information from models to identify true labels for most training examples. First, we introduce a cross selection strategy, which enables two deep models to select true labels of partially labeled data for each other. Besides, we propose a novel consistent regularization term called co-mix to avoid sample waste and tiny noise caused by false selection. In this way, CroSel can pick out the true labels of most examples with high precision. Extensive experiments demonstrate the superiority of CroSel, which consistently outperforms previous state-of-the-art methods on benchmark datasets. Additionally, our method achieves over 90\% accuracy and quantity for selecting true labels on CIFAR-type datasets under various settings.
翻译:半监督学习是一种重要的弱监督学习方法,它允许每个训练样本具有候选标签集,而不是单个的真实标签。已有的方法通常将真实标签视为一个潜在变量进行识别。但准确识别真实标签仍然是一个难点,会在模型训练中引入伪标签的噪音。本文提出了一种名为CroSel的新方法,利用历史预测信息来辨别大多数训练样本的真实标签。具体而言,该方法引入了一种交叉选择策略,利用两个深度模型相互选择半监督数据的真实标签。此外,本文还提出了一种新的一致性正则化项co-mix,以避免由错误选择导致的样本浪费和小噪声。CroSel能够高精度地挑选出大多数样本的真实标签。实验结果表明,CroSel在基准数据集上始终优于以往的最先进方法。此外,我们的方法在各种设置下在 CIFAR 类型数据集上可以达到超过 90% 的准确度和选择数量。