Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning. Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data. One common assumption in most SSL methods is that the labeled and unlabeled data are from the same data distribution. However, this is hardly the case in many real-world scenarios, which limits their applicability. In this work, instead, we attempt to solve the challenging open-world SSL problem that does not make such an assumption. In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data. This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes. Using a bi-level optimization rule this pairwise similarity loss exploits the information available in the labeled set to implicitly cluster novel class samples, while simultaneously recognizing samples from known classes. After discovering novel classes, OpenLDN transforms the open-world SSL problem into a standard SSL problem to achieve additional performance gains using existing SSL methods. Our extensive experiments demonstrate that OpenLDN outperforms the current state-of-the-art methods on multiple popular classification benchmarks while providing a better accuracy/training time trade-off.
翻译:半监督学习(SSL)是解决受监督学习的批注瓶颈的主要方法之一。 最新的SSL方法可以有效地利用大量无标签数据储存库来提高业绩,同时依靠少量标签数据。 大多数SSL方法的一个共同假设是,标签和未标签数据来自同一数据分布。 然而,在许多真实世界情景中,这几乎不是限制其适用性的许多情况。 相反,我们试图解决挑战性的开放世界的SSL问题,但并没有做出这样的假设。在开放世界的SSL问题中,目标是识别已知类的样本,同时检测和组装属于未标签数据中的新类的样本。这项工作引入了OpenLDN,利用双相类似损失来发现新类。使用双级优化规则,将标签中的信息用于隐含的分组新类样本,同时识别已知类的样本。在发现新课程后,OpenLDN将开放世界的SSL问题转换为已知类样本,同时检测属于未标签数据中的新类的新类的样本。 这项工作引入了OnLDN, 利用双相似性损失来发现新的标准SLLLLLA(S) 的进度, 以获得更多的进度方法, 以获得新的SLSLDDDDDA 。