重新思考在不平衡的半监测学习中重新抽样 (Rethinking Re-Sampling in Imbalanced Semi-Supervised Learning)

Semi-Supervised Learning (SSL) has shown its strong ability in utilizing unlabeled data when labeled data is scarce. However, most SSL algorithms work under the assumption that the class distributions are balanced in both training and test sets. In this work, we consider the problem of SSL on class-imbalanced data, which better reflects real-world situations but has only received limited attention so far. In particular, we decouple the training of the representation and the classifier, and systematically investigate the effects of different data re-sampling techniques when training the whole network including a classifier as well as fine-tuning the feature extractor only. We find that data re-sampling is of critical importance to learn a good classifier as it increases the accuracy of the pseudo-labels, in particular for the minority classes in the unlabeled data. Interestingly, we find that accurate pseudo-labels do not help when training the feature extractor, rather contrariwise, data re-sampling harms the training of the feature extractor. This finding is against the general intuition that wrong pseudo-labels always harm the model performance in SSL. Based on these findings, we suggest to re-think the current paradigm of having a single data re-sampling strategy and develop a simple yet highly effective Bi-Sampling (BiS) strategy for SSL on class-imbalanced data. BiS implements two different re-sampling strategies for training the feature extractor and the classifier and integrates this decoupled training into an end-to-end framework... Code will be released at https://github.com/TACJu/Bi-Sampling.

翻译：在标签数据稀少时,半超度学习(SSL) 显示其使用未贴标签数据的能力很强。然而, 大多数 SSL 算法在假设类分布在培训和测试组中是平衡的的前提下工作。在这项工作中, 我们考虑到 SSL 在类平衡数据上的问题, 这能更好地反映真实世界的情况, 但迄今为止只得到有限的关注。特别是, 我们解开对代表方和分类器的培训, 系统调查在整个网络培训包括分类器以及只微调功能提取器时, 不同数据再复制技术的影响。我们发现, 数据再复制对于学习一个良好的分类器至关重要, 因为它会提高伪标签的准确性, 特别是未贴标签数据组的少数类的准确性。有趣的是, 我们发现, 准确的伪标签在培训特性提取器时, 而不是相反地, 重新复制数据提取工具将损害对功能提取器的训练。这个发现与一般的错误的错误的伪缩略性 Strible- train- Serveal S. S. SLAs real real real develop a strical develop develop develop develop slavelop smal develop smlateal develop slation.