Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this work we introduce a new framework for SSL named NorMatch. Firstly, we introduce a new uncertainty estimation scheme based on normalizing flows, as an auxiliary classifier, to enforce highly certain pseudo-labels yielding a boost of the discriminative classifiers. Secondly, we introduce a threshold-free sample weighting strategy to exploit better both high and low confidence pseudo-labels. Furthermore, we utilize normalizing flows to model, in an unsupervised fashion, the distribution of unlabeled data. This modelling assumption can further improve the performance of generative classifiers via unlabeled data, and thus, implicitly contributing to training a better discriminative classifier. We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
翻译:半超读学习(SSL) 旨在学习一种模型,使用微小的标签数据集和大量未贴标签的数据。为了更好地利用未贴标签的数据,最新的SSL方法使用单一歧视分类者预测的假标签。然而,生成的伪标签不可避免地与固有的确认偏差和噪音相关,严重影响模型性能。在这项工作中,我们为SSL引入了一个名为NorMatch的新框架。首先,我们引入了一个新的不确定性估算计划,其基础是,以正常流为基础,作为辅助分类器,强制实施高度特定伪标签,以产生歧视分类器的推力。第二,我们引入了一种无门槛的样本加权战略,以更好地利用高信任度和低信任度的伪标签。此外,我们利用正常流,以不受监督的方式模拟未贴标签的数据的分布。这一建模假设可以进一步改进基因化分类器的性能,通过未贴标签数据,从而隐含地帮助培训一个更好的歧视分类器。我们通过数字和视觉结果,通过数字和视觉结果,证明NorMatch在几个数据集上实现状态性业绩。