Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied. While existing semi-supervised learning (SSL) methods are known to perform poorly on minority classes, we find that they still generate high precision pseudo-labels on minority classes. By exploiting this property, in this work, we propose Class-Rebalancing Self-Training (CReST), a simple yet effective framework to improve existing SSL methods on class-imbalanced data. CReST iteratively retrains a baseline SSL model with a labeled set expanded by adding pseudo-labeled samples from an unlabeled set, where pseudo-labeled samples from minority classes are selected more frequently according to an estimated class distribution. We also propose a progressive distribution alignment to adaptively adjust the rebalancing strength dubbed CReST+. We show that CReST and CReST+ improve state-of-the-art SSL algorithms on various class-imbalanced datasets and consistently outperform other popular rebalancing methods. Code has been made available at https://github.com/google-research/crest.
翻译:正在研究关于课堂平衡数据的半监督学习,尽管这是一个现实的问题,但正在研究中。虽然已知现有的半监督学习方法在少数民族班级表现不佳,但我们发现,这些方法仍然在少数民族班级产生高精度的假标签。在这项工作中,我们提议利用这一属性,进行类平衡自我培训(CREST),这是一个简单而有效的框架,用以改进现有关于课堂平衡数据的现有 SSL 方法。CREST 迭接性地重新研究一个基线 SSL 模型,并加贴标签的数据集,从一个未贴标签的数据集中添加假标签样本,其中根据估计的班级分布更经常地选择少数群体班级的伪标签样本。我们还提议逐步进行分配,以适应性地调整CREST+的再平衡强度。我们显示,CREST 和CREST+ 改进了各类平衡数据集的状态-艺术 SL 算法,并不断超越其他大众再平衡方法。代码已在 https://github.com/golegle-reas/cres。