Recent advances in self-supervised visual representation learning have paved the way for unsupervised methods tackling tasks such as object discovery and instance segmentation. However, discovering objects in an image with no supervision is a very hard task; what are the desired objects, when to separate them into parts, how many are there, and of what classes? The answers to these questions depend on the tasks and datasets of evaluation. In this work, we take a different approach and propose to look for the background instead. This way, the salient objects emerge as a by-product without any strong assumption on what an object should be. We propose FOUND, a simple model made of a single $conv1\times1$ initialized with coarse background masks extracted from self-supervised patch-based representations. After fast training and refining these seed masks, the model reaches state-of-the-art results on unsupervised saliency detection and object discovery benchmarks. Moreover, we show that our approach yields good results in the unsupervised semantic segmentation retrieval task. The code to reproduce our results is available at https://github.com/valeoai/FOUND.
翻译:自我监督的视觉演示学习的最新进展为处理物体发现和实例分割等任务而采取不受监督的方法铺平了道路。 然而,在没有监督的情况下在图像中发现物体是一项非常艰巨的任务; 想要的东西是什么, 何时将其分成几个部分, 有多少个, 哪些类别? 这些问题的答案取决于评估的任务和数据集。 在这项工作中, 我们采取不同的方法, 并提议寻找背景。 这样, 突出的物体会作为一个副产品出现, 而不对对象应该是什么作任何强烈的假设。 我们提议FOUND, 一种简单的模型, 由单一的 $conv1\times1$制成, 最初由自我监督的补丁图中提取的粗糙的背景面罩制成。 在快速培训和改进这些种子掩码后, 模型在未经监督的显著探测和对象发现基准上达到了最先进的结果。 此外, 我们显示, 我们的方法在未经监督的语义分类中产生良好的结果。 我们复制结果的代码可以在 https://github. com/ valaifialfo 上查阅 。