Semi-supervised Learning (SSL) has witnessed great success owing to the impressive performances brought by various methods based on pseudo labeling and consistency regularization. However, we argue that existing methods might fail to utilize the unlabeled data more effectively since they either use a pre-defined / fixed threshold or an ad-hoc threshold adjusting scheme, resulting in inferior performance and slow convergence. We first analyze a motivating example to obtain intuitions on the relationship between the desirable threshold and model's learning status. Based on the analysis, we hence propose FreeMatch to adjust the confidence threshold in a self-adaptive manner according to the model's learning status. We further introduce a self-adaptive class fairness regularization penalty to encourage the model for diverse predictions during the early training stage. Extensive experiments indicate the superiority of FreeMatch especially when the labeled data are extremely rare. FreeMatch achieves 5.78%, 13.59%, and 1.28% error rate reduction over the latest state-of-the-art method FlexMatch on CIFAR-10 with 1 label per class, STL-10 with 4 labels per class, and ImageNet with 100 labels per class, respectively. Moreover, FreeMatch can also boost the performance of imbalanced SSL. The codes can be found at https://github.com/microsoft/Semi-supervised-learning.
翻译:半监督学习(SSL)取得了巨大成功,因为基于假标签和一致性规范化的各种方法带来了令人印象深刻的业绩。然而,我们认为,现有方法可能无法更有效地使用未贴标签的数据,因为它们使用预设的/固定的门槛值或特别的临界值调整计划,导致业绩低劣和趋同缓慢。我们首先分析一个激励性范例,以获得对理想门槛和模式学习状况之间关系的直觉。根据分析,我们因此建议FreeMatch根据模型的学习状况,以自我适应的方式调整信任门槛。我们进一步引入了自我适应的阶级公平性规范处罚,以鼓励早期培训阶段的不同预测模式。广泛的实验表明自由Match的优越性,特别是在标签数据极为罕见的情况下。免费Match取得了5.78%、13.59%和1.28%的错误率,而最新的先进方法FlexMatch是CIFAR-10,每类有1个标签,STL-10,每类有4个标签,每类有100个SLM/Slimexlimexial,每类有100个SL/SL/SL/SLAx。在SLSLA中可以分别找到业绩标签。