Semi-supervised learning is the problem of training an accurate predictive model by combining a small labeled dataset with a presumably much larger unlabeled dataset. Many methods for semi-supervised deep learning have been developed, including pseudolabeling, consistency regularization, and contrastive learning techniques. Pseudolabeling methods however are highly susceptible to confounding, in which erroneous pseudolabels are assumed to be true labels in early iterations, thereby causing the model to reinforce its prior biases and thereby fail to generalize to strong predictive performance. We present a new approach to suppress confounding errors through a method we describe as Semi-supervised Contrastive Outlier removal for Pseudo Expectation Maximization (SCOPE). Like basic pseudolabeling, SCOPE is related to Expectation Maximization (EM), a latent variable framework which can be extended toward understanding cluster-assumption deep semi-supervised algorithms. However, unlike basic pseudolabeling which fails to adequately take into account the probability of the unlabeled samples given the model, SCOPE introduces an outlier suppression term designed to improve the behavior of EM iteration given a discrimination DNN backbone in the presence of outliers. Our results show that SCOPE greatly improves semi-supervised classification accuracy over a baseline, and furthermore when combined with consistency regularization achieves the highest reported accuracy for the semi-supervised CIFAR-10 classification task using 250 and 4000 labeled samples. Moreover, we show that SCOPE reduces the prevalence of confounding errors during pseudolabeling iterations by pruning erroneous high-confidence pseudolabeled samples that would otherwise contaminate the labeled set in subsequent retraining iterations.
翻译:半监督的学习是培训一个准确的预测模型的问题,方法是将一个标签的小型数据组与一个大得多的未贴标签的数据集结合起来,从而培训一个准确的预测模型。 已经开发出许多半监督的深层次学习方法,包括假标签、一致性正规化和对比式学习技术。 类比标签方法非常容易混淆, 假设错误的假标签是早期迭代中的真实标签, 从而导致模型强化其先前的偏差, 从而无法概括到强烈的预测性能。 我们提出了一个新办法, 通过一种我们称之为半监督的模拟模拟预期最大化( SOPE) 的反比值错误错误错误的错误。 像基本的假标签一样, SCOPE 与期望最大化(EM) 有关, 一个潜在的变量框架可以扩展, 以便理解早期迭代的集成的深度半监督算法。 但是, 与基本伪标签无法充分考虑未贴标签的样品的概率, SCOPE 引入了一个比值的比值值值值的比值错误的比值术语, 以其他方法来降低用于改善我们内部IM IM 标值的比值 标值的比值 。