Recent state-of-the-art methods in imbalanced semi-supervised learning (SSL) rely on confidence-based pseudo-labeling with consistency regularization. To obtain high-quality pseudo-labels, a high confidence threshold is typically adopted. However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable. In this work, we present a new perspective of pseudo-labeling for imbalanced SSL. Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i.e., close to the current training data. To decide whether an unlabeled sample is ``in-distribution'' or ``out-of-distribution'', we adopt the energy score from out-of-distribution detection literature. As training progresses and more unlabeled samples become in-distribution and contribute to training, the combined labeled and pseudo-labeled data can better approximate the true class distribution to improve the model. Experiments demonstrate that our energy-based pseudo-labeling method, \textbf{InPL}, albeit conceptually simple, significantly outperforms confidence-based methods on imbalanced SSL benchmarks. For example, it produces around 3\% absolute accuracy improvement on CIFAR10-LT. When combined with state-of-the-art long-tailed SSL methods, further improvements are attained. In particular, in one of the most challenging scenarios, InPL achieves a 6.9\% accuracy improvement over the best competitor.
翻译:近些年来,在半监督的不均匀学习(SSL)中,最新最先进的方法依赖于基于信任的假标签和一致性规范。为了获得高质量的伪标签,通常会采用高信任门槛。然而,已经表明,对于远离培训数据的样本来说,深网络中基于软马克的自信评分可能任意高,因此,甚至高信任不贴标签的样本的伪标签可能仍然不可靠。在这项工作中,我们展示了一种基于信任的假标签为不平衡的SSL(SSL)提供假标签的新视角。在不依赖模型信任的情况下,我们提议测量一个未标的样本是否可能是“在分配中”的伪标签;也就是说,接近当前的培训数据。要确定一个非标签的样本是“在分配中”还是“在分配中”的样本,我们采用从分配以外的检测文献中获得的能量评分。随着培训的进展和更多的不贴标签的样本在分配中出现,并且有助于培训,联合标签和假标签的数据在“分配中”是否可能更好接近到“在分配中”的准确度;即,在Sral的模型中,在“最具有最有挑战性的模型的模型的精确的模型的模型中,我们能够在“最精确的模型上取得更高的”的精确度。</s>