While remarkable success has been achieved in weakly-supervised object localization (WSOL), current frameworks are not capable of locating objects of novel categories in open-world settings. To address this issue, we are the first to introduce a new weakly-supervised object localization task called OWSOL (Open-World Weakly-Supervised Object Localization). During training, all labeled data comes from known categories and, both known and novel categories exist in the unlabeled data. To handle such data, we propose a novel paradigm of contrastive representation co-learning using both labeled and unlabeled data to generate a complete G-CAM (Generalized Class Activation Map) for object localization, without the requirement of bounding box annotation. As no class label is available for the unlabelled data, we conduct clustering over the full training set and design a novel multiple semantic centroids-driven contrastive loss for representation learning. We re-organize two widely used datasets, i.e., ImageNet-1K and iNatLoc500, and propose OpenImages150 to serve as evaluation benchmarks for OWSOL. Extensive experiments demonstrate that the proposed method can surpass all baselines by a large margin. We believe that this work can shift the close-set localization towards the open-world setting and serve as a foundation for subsequent works. Code will be released at https://github.com/ryylcc/OWSOL.
翻译:尽管弱监督对象定位 (WSOL) 取得了可观的成功, 但目前的框架无法在开放世界的情境下定位新类别的对象。为解决这个问题, 我们首次引入了一个新的在开放世界下进行弱监督对象定位的任务,称为OWSOL(开放世界下的弱监督对象定位)。在训练期间,所有有标签的数据都来自于已知类别,未标记的数据中既包括已知类别也包括新类别。为了处理这样的数据,我们提出了一种新的对比表示共同学习范例,利用有标签和未标记的数据生成完整的通用类激活图(G-CAM)用于对象定位,而不需要边界框标注。由于未标记数据中没有类标签,我们对整个训练集进行聚类,并设计了一种新的多个语义质心驱动的对比损失函数,用于表示学习。我们重新组织了两个广泛使用的数据集,即 ImageNet-1K 和 iNatLoc500,并提出了 OpenImages150 用于 OWSOL 的评估基准。广泛的实验表明,所提出的方法可以大大超过所有的基线模型。我们相信这项工作可以将紧凑集定位转移到开放世界环境中,并为随后的工作奠定基础。代码将在 https://github.com/ryylcc/OWSOL 发布。