Pseudo-labeling (PL) and Data Augmentation-based Consistency Training (DACT) are two approaches widely used in Semi-Supervised Learning (SSL) methods. These methods exhibit great power in many machine learning tasks by utilizing unlabeled data for efficient training. But in a more realistic setting (termed as open-set SSL), where unlabeled dataset contains out-of-distribution (OOD) samples, the traditional SSL methods suffer severe performance degradation. Recent approaches mitigate the negative influence of OOD samples by filtering them out from the unlabeled data. However, it is not clear whether directly removing the OOD samples is the best choice. Furthermore, why PL and DACT could perform differently in open-set SSL remains a mystery. In this paper, we thoroughly analyze various SSL methods (PL and DACT) on open-set SSL and discuss pros and cons of these two approaches separately. Based on our analysis, we propose Style Disturbance to improve traditional SSL methods on open-set SSL and experimentally show our approach can achieve state-of-the-art results on various datasets by utilizing OOD samples properly. We believe our study can bring new insights for SSL research.
翻译:以半暂停学习方法中广泛使用的两种方法,是半暂停学习方法(SSL)中广泛使用的两种方法。这些方法在许多机器学习任务中表现出巨大的力量,利用未贴标签的数据进行高效培训。但在更现实的环境下(称为开放设置的SSL),未贴标签的数据集包含分配(OOOD)样本,传统的SSL方法严重性能退化。最近的方法通过从未贴标签的数据中过滤OOOD样本,减轻了OOOD样本的负面影响。然而,尚不清楚直接删除 OOOD样本是否是最佳选择。此外,为什么CPL和DACT在开放设置的SSL中能够以不同的方式执行,这仍然是个谜团。在本文件中,我们深入分析了开放设置的SSL(P和DACT)中的各种SSL方法,并分别讨论这两种方法的Pros和共性能。根据我们的分析,我们建议SLF在改进关于开放设置的SSL传统方法和实验性地展示我们的方法能够实现开放设置的状态深刻的OARD结果。