In this paper, we study the problem of unsupervised object segmentation from single images. We do not introduce a new algorithm, but systematically investigate the effectiveness of existing unsupervised models on challenging real-world images. We firstly introduce four complexity factors to quantitatively measure the distributions of object- and scene-level biases in appearance and geometry for datasets with human annotations. With the aid of these factors, we empirically find that, not surprisingly, existing unsupervised models catastrophically fail to segment generic objects in real-world images, although they can easily achieve excellent performance on numerous simple synthetic datasets, due to the vast gap in objectness biases between synthetic and real images. By conducting extensive experiments on multiple groups of ablated real-world datasets, we ultimately find that the key factors underlying the colossal failure of existing unsupervised models on real-world images are the challenging distributions of object- and scene-level biases in appearance and geometry. Because of this, the inductive biases introduced in existing unsupervised models can hardly capture the diverse object distributions. Our research results suggest that future work should exploit more explicit objectness biases in the network design.
翻译:在本文中,我们从单个图像中研究未经监督的物体分割问题。 我们没有引入新的算法,而是系统地调查在挑战真实世界图像方面现有的未经监督的模式的有效性。 我们首先引入了四个复杂因素,以便量化测量带有人类注释的数据集在外观和几何层面的物体偏差分布。 在这些因素的帮助下,我们从经验中发现,现有的未经监督的模式在现实世界图像中灾难性地无法分解普通物体,尽管由于合成图像和真实图像之间在目标偏差方面存在巨大差距,这些模型很容易在许多简单的合成数据集上取得优异的性能。我们通过对已变形真实世界数据集的多组进行广泛的实验,我们最终发现,造成现实世界图像上未经监督的模型巨大失灵的主要因素是外观和几何测量中的物体和场界偏差具有挑战性分布。 因此,在现有的未经监督的模型中引入的诱导性偏差很难捕捉到不同的物体分布。 我们的研究结果表明,未来的网络应该利用更清晰的物体偏差来设计。