In this paper, we show that recent advances in self-supervised feature learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago. We propose a methodology based on unsupervised saliency masks and self-supervised feature clustering to kickstart object discovery followed by training a semantic segmentation network on pseudo-labels to bootstrap the system on images with multiple objects. We present results on PASCAL VOC that go far beyond the current state of the art (47.3 mIoU), and we report for the first time results on MS COCO for the whole set of 81 classes: our method discovers 34 categories with more than $20\%$ IoU, while obtaining an average IoU of 19.6 for all 81 categories.
翻译:在本文中,我们展示了在自我监督特性学习方面的最新进展,使不受监督的物体发现和语义分离与10年前在受监督的语义分离方面的实地状态相匹配的性能。我们提出了一个基于未经监督的显要面罩和自监督的特征集成的方法,以启动物体发现,然后对假标签的语义分割网络进行培训,将系统绑在有多个物体的图像上。我们介绍了PASCAL VOC的结果,这些结果远远超出了目前水平(47.3 mIoU),我们首次报告了整个81类的MS COCO的结果:我们的方法发现了34类,超过20美元IoU,而81类的平均IOU为19.6美元。