While remarkable success has been achieved in weakly-supervised object localization (WSOL), current frameworks are not capable of locating objects of novel categories in open-world settings. To address this issue, we are the first to introduce a new weakly-supervised object localization task called OWSOL (Open-World Weakly-Supervised Object Localization). During training, all labeled data comes from known categories and, both known and novel categories exist in the unlabeled data. To handle such data, we propose a novel paradigm of contrastive representation co-learning using both labeled and unlabeled data to generate a complete G-CAM (Generalized Class Activation Map) for object localization, without the requirement of bounding box annotation. As no class label is available for the unlabelled data, we conduct clustering over the full training set and design a novel multiple semantic centroids-driven contrastive loss for representation learning. We re-organize two widely used datasets, i.e., ImageNet-1K and iNatLoc500, and propose OpenImages150 to serve as evaluation benchmarks for OWSOL. Extensive experiments demonstrate that the proposed method can surpass all baselines by a large margin. We believe that this work can shift the close-set localization towards the open-world setting and serve as a foundation for subsequent works. Code will be released at https://github.com/ryylcc/OWSOL.
翻译:开放世界弱监督的目标定位
虽然弱监督目标定位(WSOL)取得了显著的成功,但当前的框架无法在开放世界设置中定位新类别的对象。为了解决这个问题,我们首次引入了一种新的弱监督目标定位任务,称为OWSOL(开放世界弱监督的目标定位)。在训练过程中,所有标记数据都来自已知类别,并且未标记的数据中存在已知和新类别。为了处理这些数据,我们提出了一种新的对比度表示联合学习范式,使用带有标签和未标记的数据来生成完整的G-CAM(广义类激活图),用于目标定位,而不需要边界框注释。由于未标记数据没有类标签,因此我们在整个训练集上进行聚类,并设计了一种新的多重语义质心驱动的对比损失来进行表示学习。我们重新组织了两个广泛使用的数据集,即ImageNet-1K和iNatLoc500,并提出了OpenImages150作为OWSOL的评估基准。大量实验证明了所提出的方法可以超越所有基线。我们相信这项工作可以将闭集本地化转向开放世界设置,并作为后续工作的基础。代码将在https://github.com/ryylcc/OWSOL上发布。