The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance. However, they heavily rely on domain-specific data augmentations, which are not easy to generate for all data modalities. Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation. We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models; these predictions generate many incorrect pseudo-labels, leading to noisy training. We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process. Furthermore, UPS generalizes the pseudo-labeling process, allowing for the creation of negative pseudo-labels; these negative pseudo-labels can be used for multi-label classification as well as negative learning to improve the single-label classification. We achieve strong performance when compared to recent SSL methods on the CIFAR-10 and CIFAR-100 datasets. Also, we demonstrate the versatility of our method on the video dataset UCF-101 and the multi-label dataset Pascal VOC.
翻译:最近对半监督学习(SSL)的研究主要以一致性规范化方法为主,这些方法能取得很强的性能。然而,它们严重依赖特定领域的数据增强(UPS)框架,这种框架通过大量减少培训过程中遇到的噪音来提高伪标签的准确性。此外,UPS普遍采用伪标签程序,允许创建负伪标签;这些负伪标签可用于多标签分类,以及用于改进单标签分类的负面学习。与最近关于CIFAR-10和CIFAR-100数据集的SL方法相比,我们取得了很强的业绩。此外,我们展示了我们关于视频数据设置的多标签和多标签数据的多功能性能。