Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning. Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data. One common assumption in most SSL methods is that the labeled and unlabeled data are from the same underlying data distribution. However, this is hardly the case in many real-world scenarios, which limits their applicability. In this work, instead, we attempt to solve the recently proposed challenging open-world SSL problem that does not make such an assumption. In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data. This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes. Using a bi-level optimization rule this pairwise similarity loss exploits the information available in the labeled set to implicitly cluster novel class samples, while simultaneously recognizing samples from known classes. After discovering novel classes, OpenLDN transforms the open-world SSL problem into a standard SSL problem to achieve additional performance gains using existing SSL methods. Our extensive experiments demonstrate that OpenLDN outperforms the current state-of-the-art methods on multiple popular classification benchmarks while providing a better accuracy/training time trade-off.
翻译:半监督的学习(SSL)是解决受监督学习的批注瓶颈的主要方法之一。 最新的SSL方法可以有效地利用大量未贴标签的数据储存库来提高业绩,同时依靠少量标签数据。 大多数SSL方法的一个共同假设是,标签和未贴标签的数据来自相同的基本数据分布。 然而,在许多现实世界情景中,这种情况限制了其适用性。 相反,在这项工作中,我们试图解决最近提出的挑战性开放世界的SSL问题,而这一问题并没有做出这样的假设。 在开放世界的SSL问题中,目标是识别已知类的样本,同时检测和分组属于未贴标签数据中的新类的样本。 这项工作引入了OpenLDN, 利用相近相似性损失来发现新类。 使用双级优化规则,这种相似性损失利用了标签中的信息隐含的大众类样本,同时从已知的类中发现了样本。 在发现新的新类之后, OpenLDN将已知类的样本转换为开放世界的工序,同时展示了我们当前标准工序的SLSL的进度, 以更高的SL 标准化方法展示了我们现有的SLSLSL 。