While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class-imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes.
翻译:虽然事实证明半监督学习(SSL)在标签数据稀少时是利用未贴标签数据的一个很有希望的方法,但现有的SSL算法通常假定培训班的分配是均衡的,然而,在不平衡的班级分配下培训的这些SSL算法在推广到均衡的测试标准时会受到严重影响,因为这些算法使用偏颇的未贴标签数据假标签给多数班级使用。为了缓解这一问题,我们提出了一个螺旋优化问题,以软化地完善由偏差模型产生的伪标签,并开发一个简单算法,名为Pseudo-lab(DARP)的分布对齐调整再精度,以合理和高效的方式加以解决。 在各种班级平衡的半监督情景下,我们展示DARP的有效性及其与最新科技的SSL计划的兼容性。