Deep learning is pushing the state-of-the-art in many computer vision applications. However, it relies on large annotated data repositories, and capturing the unconstrained nature of the real-world data is yet to be solved. Semi-supervised learning (SSL) complements the annotated training data with a large corpus of unlabeled data to reduce annotation cost. The standard SSL approach assumes unlabeled data are from the same distribution as annotated data. Recently, a more realistic SSL problem, called open-world SSL, is introduced, where the unannotated data might contain samples from unknown classes. In this paper, we propose a novel pseudo-label based approach to tackle SSL in open-world setting. At the core of our method, we utilize sample uncertainty and incorporate prior knowledge about class distribution to generate reliable class-distribution-aware pseudo-labels for unlabeled data belonging to both known and unknown classes. Our extensive experimentation showcases the effectiveness of our approach on several benchmark datasets, where it substantially outperforms the existing state-of-the-art on seven diverse datasets including CIFAR-100 (~17%), ImageNet-100 (~5%), and Tiny ImageNet (~9%). We also highlight the flexibility of our approach in solving novel class discovery task, demonstrate its stability in dealing with imbalanced data, and complement our approach with a technique to estimate the number of novel classes
翻译:深入学习正在推动许多计算机视觉应用中最先进的技术。 但是,它依靠大量附加说明的数据储存库,捕捉真实世界数据不受限制的性质还有待解决。 半监管学习(SSL)以大量未贴标签的数据补充附加说明的培训数据,以减少批注费用。 标准 SSL 方法假设未贴标签的数据与附加说明的数据相同。 最近,引入了一个更现实的 SSL 问题,称为开放世界 SSL, 其中未附加说明的数据可能包含来自未知阶级的样本。 在本文中,我们提出一个新的假标签法,在开放世界设置中处理SSL。 在我们的方法核心,我们利用抽样不确定性,并纳入关于班级分配的先前知识,以产生可靠的阶级分配-有标识的伪标签数据与附加说明的数据相同。 我们的广泛实验展示了我们在若干基准数据集上的方法的有效性,称为开放世界 SSL, 在那里, 未加注解的数据可能包含来自未知阶级的样本。 在7种不同的数据集中,我们提出了新的假标签法, 包括 IMAR- IMU IM 和 IMQ IMU IML IML IMU IML 。