走向现实的半支持学习 (Towards Realistic Semi-Supervised Learning)

Deep learning is pushing the state-of-the-art in many computer vision applications. However, it relies on large annotated data repositories, and capturing the unconstrained nature of the real-world data is yet to be solved. Semi-supervised learning (SSL) complements the annotated training data with a large corpus of unlabeled data to reduce annotation cost. The standard SSL approach assumes unlabeled data are from the same distribution as annotated data. Recently, ORCA [9] introduce a more realistic SSL problem, called open-world SSL, by assuming that the unannotated data might contain samples from unknown classes. This work proposes a novel approach to tackle SSL in open-world setting, where we simultaneously learn to classify known and unknown classes. At the core of our method, we utilize sample uncertainty and incorporate prior knowledge about class distribution to generate reliable pseudo-labels for unlabeled data belonging to both known and unknown classes. Our extensive experimentation showcases the effectiveness of our approach on several benchmark datasets, where it substantially outperforms the existing state-of-the-art on seven diverse datasets including CIFAR-100 (17.6%), ImageNet-100 (5.7%), and Tiny ImageNet (9.9%).

翻译：深入学习正在推动许多计算机视觉应用中最先进的技术。但是,它依靠大型附加说明的数据储存库,捕捉真实世界数据不受限制的性质还有待解决。半监管学习(SSL)以大量未贴标签的数据补充附加说明的培训数据,以减少批注费用。标准SSL方法假设无标签数据来自与附加说明数据相同的分发数据。最近, ORCA [9] 引入了一个更现实的 SSL问题, 称为开放世界 SSL, 假设未附加说明的数据可能包含来自未知类的样本。这项工作提出了在开放世界环境中处理SSL的新办法, 即我们同时学习对已知和未知类进行分类。在我们的方法核心, 我们利用抽样不确定性,并吸收关于班级分配的先前知识,为属于已知和未知类的未贴标签数据生成可靠的假标签。我们的广泛实验展示了我们在若干基准数据集上的方法的有效性, 其基本优于七种不同数据集的现有状态, 包括 CIFAR- 100.9% 和 TinNet (5-100.9%) 图像- 。