The core issue in semi-supervised learning (SSL) lies in how to effectively leverage unlabeled data, whereas most existing methods tend to put a great emphasis on the utilization of high-confidence samples yet seldom fully explore the usage of low-confidence samples. In this paper, we aim to utilize low-confidence samples in a novel way with our proposed mutex-based consistency regularization, namely MutexMatch. Specifically, the high-confidence samples are required to exactly predict "what it is" by conventional True-Positive Classifier, while the low-confidence samples are employed to achieve a simpler goal -- to predict with ease "what it is not" by True-Negative Classifier. In this sense, we not only mitigate the pseudo-labeling errors but also make full use of the low-confidence unlabeled data by consistency of dissimilarity degree. MutexMatch achieves superior performance on multiple benchmark datasets, i.e., CIFAR-10, CIFAR-100, SVHN, STL-10, mini-ImageNet and Tiny-ImageNet. More importantly, our method further shows superiority when the amount of labeled data is scarce, e.g., 92.23% accuracy with only 20 labeled data on CIFAR-10. Our code and model weights have been released at https://github.com/NJUyued/MutexMatch4SSL.
翻译:半监督学习(SSL)的核心问题在于如何有效地利用未贴标签的数据,而大多数现有方法倾向于大力强调使用高自信样本,但很少充分探索低自信样本的使用。在本文中,我们的目标是以新颖的方式利用低自信样本,与我们提议的基于哑巴的一致性规范(即MutexMatch)相一致。具体地说,高自信样本必须精确预测常规真实动态分类的“是什么”,而低信任样本则用于实现更简单的目标 -- -- 即通过真信任分类轻松地预测“不是什么”。更重要的是,我们不仅减轻伪标签错误,而且通过差异程度的一致,充分利用低信任无标签数据。MutexMatch在多个基准数据集(即,CIFAR-10,CIFAR-100,SVHN,STL-10,MIMageNet和Tiniy-ImageNet)的“什么?更重要的是,当我们的数据的标定值为20-10MRFAR的精确度时,我们的方法进一步展示了我们的数据的精确度。