Unsupervised localization and segmentation are long-standing computer vision challenges that involve decomposing an image into semantically-meaningful segments without any labeled data. These tasks are particularly interesting in an unsupervised setting due to the difficulty and cost of obtaining dense image annotations, but existing unsupervised approaches struggle with complex scenes containing multiple objects. Differently from existing methods, which are purely based on deep learning, we take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem. Specifically, we examine the eigenvectors of the Laplacian of a feature affinity matrix from self-supervised networks. We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene. Furthermore, by clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions, i.e. semantic segmentations. Experiments on complex datasets (Pascal VOC, MS-COCO) demonstrate that our simple spectral method outperforms the state-of-the-art in unsupervised localization and segmentation by a significant margin. Furthermore, our method can be readily used for a variety of complex image editing tasks, such as background removal and compositing.
翻译:不受监督的本地化和分割是长期存在的计算机视觉挑战,涉及将图像分解成不含任何标签数据、具有象征意义的部分。这些任务在未经监督的环境中特别有趣,因为难以获得密集的图像说明且成本很高,但现有的未经监督的方法与包含多个天体的复杂场景进行了斗争。与纯粹基于深层次学习的现有方法不同,我们从传统的光谱分解方法中汲取灵感,将图像分解成图解问题。具体地说,我们检查自上而下的网络的特征亲近矩阵的拉普莱西亚的偏离基因体。我们发现,这些偏离体已经将图像分解成有意义的部分,而且可以很容易地用于将含有多个天体的物体定位。此外,通过将与这些部分相关的特征集中在一个数据集中,我们可以通过将图解析、可命名的区域,即复杂的语系分解问题。在复杂的数据集(Pascal VOC、MS-COCO)上,从一个地貌接近的矩阵矩阵矩阵矩阵中实验,我们用一个简单的光谱分解法将一个简单的分解方法显示我们快速的图像分解法,可以用来进行重大的分解。