Recent advances in self-supervised visual representation learning have paved the way for unsupervised methods tackling tasks such as object discovery and instance segmentation. However, discovering objects in an image with no supervision is a very hard task; what are the desired objects, when to separate them into parts, how many are there, and of what classes? The answers to these questions depend on the tasks and datasets of evaluation. In this work, we take a different approach and propose to look for the background instead. This way, the salient objects emerge as a by-product without any strong assumption on what an object should be. We propose FOUND, a simple model made of a single $conv1\times1$ initialized with coarse background masks extracted from self-supervised patch-based representations. After fast training and refining these seed masks, the model reaches state-of-the-art results on unsupervised saliency detection and object discovery benchmarks. Moreover, we show that our approach yields good results in the unsupervised semantic segmentation retrieval task. The code to reproduce our results is available at https://github.com/valeoai/FOUND.
翻译:最近自监督视觉表示学习的进展为解决目标发现和实例分割等无监督方法铺平了道路。然而,在没有监管的情况下,发现图像中的物体是一项非常困难的任务,所期望的目标是什么,什么时候将它们分成部分,有多少个,以及什么类别?
对这些问题的回答取决于评估的任务和数据集。在这项工作中,我们采取了一种不同的方法,提出了从背景中寻找的想法。这样,突出的对象就成为副产品,而不需要对对象有任何强假设。我们提出了一个名为FOUND的简单模型,由一个带有从自监督基于裁剪的表示中提取的粗糙背景掩码初始化的单个$conv1×1$组成。在快速训练和优化这些种子掩码之后,该模型在无监督显着性检测和目标发现基准测试中达到了最先进的结果。此外,我们还展示了我们的方法在无监督语义分割检索任务中产生了不错的结果。复现我们的结果的代码可在https://github.com/valoai/FOUND上获得。