Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied. While existing semi-supervised learning (SSL) methods are known to perform poorly on minority classes, we find that they still generate high precision pseudo-labels on minority classes. By exploiting this property, in this work, we propose Class-Rebalancing Self-Training (CReST), a simple yet effective framework to improve existing SSL methods on class-imbalanced data. CReST iteratively retrains a baseline SSL model with a labeled set expanded by adding pseudo-labeled samples from an unlabeled set, where pseudo-labeled samples from minority classes are selected more frequently according to an estimated class distribution. We also propose a progressive distribution alignment to adaptively adjust the rebalancing strength dubbed CReST+. We show that CReST and CReST+ improve state-of-the-art SSL algorithms on various class-imbalanced datasets and consistently outperform other popular rebalancing methods.
翻译:正在研究关于类平衡数据的半监督学习,尽管这是一个现实的问题,但正在研究中。虽然已知现有的半监督学习方法在少数群体类中表现不佳,但我们发现,在少数群体类中,这些半监督学习方法仍然产生高精度的假标签。在这项工作中,我们提议利用这一属性,进行类平衡自我培训(CREST),这是一个简单而有效的框架,用以改进关于类平衡数据的现有SSL方法。CREST 反复反复重试一个基线的SSL模型,并加贴标签的数据集,从一个未加标签的数据集中添加假标签样本,根据估计的类别分布更经常地选择少数群体类类中的伪标签样本。我们还提议逐步进行分配,以适应性地调整被设为CREST+的再平衡强度。我们表明,CREST和CREST+在各种类平衡数据集中改进了最新的SL算法,并不断超越其他流行的再平衡方法。