Semi-supervised learning (SSL) is an effective means to leverage unlabeled data to improve a model's performance. Typical SSL methods like FixMatch assume that labeled and unlabeled data share the same label space. However, in practice, unlabeled data can contain categories unseen in the labeled set, i.e., outliers, which can significantly harm the performance of SSL algorithms. To address this problem, we propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch. Learning representations of inliers while rejecting outliers is essential for the success of OSSL. To this end, OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers. The OVA-classifier outputs the confidence score of a sample being an inlier, providing a threshold to detect outliers. Another key contribution is an open-set soft-consistency regularization loss, which enhances the smoothness of the OVA-classifier with respect to input transformations and greatly improves outlier detection. OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
翻译:半监督的学习(SSL)是利用未贴标签的数据来改进模型性能的有效手段。 典型的 SSL 方法, 如 FixMatch, 假设标签和未贴标签的数据共享相同的标签空间。 但在实践中, 未贴标签的数据可以包含标签数据集中看不见的类别, 即外部线, 这会大大损害 SSL 算法的性能。 为了解决这个问题, 我们提议了一种新型的开放设置半监督学习( OSL) 方法, 叫做 OpenMatch 。 学习内联的表达方式, 拒绝外联者是OSL 成功的关键。 为此, OpenMatch 将基于一五全( OVA) 分类器的新颖检测功能统一成 FixM 。 OVA 分类输出样本的置信分, 提供了检测外部线的门槛 。 另一个关键贡献是开放设置的软一致性校正校准损失, 这会提高 OVA 分类器在输入变换的方面更加平稳, 大大改进了外级的外调试制数据 。 OpM- train- glass 。