Unsupervised semantic segmentation aims to discover groupings within and across images that capture object and view-invariance of a category without external supervision. Grouping naturally has levels of granularity, creating ambiguity in unsupervised segmentation. Existing methods avoid this ambiguity and treat it as a factor outside modeling, whereas we embrace it and desire hierarchical grouping consistency for unsupervised segmentation. We approach unsupervised segmentation as a pixel-wise feature learning problem. Our idea is that a good representation shall reveal not just a particular level of grouping, but any level of grouping in a consistent and predictable manner. We enforce spatial consistency of grouping and bootstrap feature learning with co-segmentation among multiple views of the same image, and enforce semantic consistency across the grouping hierarchy with clustering transformers between coarse- and fine-grained features. We deliver the first data-driven unsupervised hierarchical semantic segmentation method called Hierarchical Segment Grouping (HSG). Capturing visual similarity and statistical co-occurrences, HSG also outperforms existing unsupervised segmentation methods by a large margin on five major object- and scene-centric benchmarks. Our code is publicly available at https://github.com/twke18/HSG .
翻译:未经监督的语义分解旨在发现图像内部和图像之间的组合,这些组合可以捕捉对象和在不受外部监督的情况下查看某一类别。 组合自然具有颗粒度, 在不受监督的分化中造成模糊性。 现有的方法避免了这种模糊性, 并将它作为建模之外的一个因素对待, 而我们则接受这种模糊性, 并且渴望在未监督的分化中形成分解性。 我们把未经监督的分化作为一个像素特性学习问题来对待。 我们的想法是, 一个好的表达式不仅能够显示特定层次的组合, 而且能够以一致和可预测的方式显示任何层次的组合 。 我们用同一图像的多种观点进行组合和靴套装特性的空间一致性学习, 并同时将这种模糊性作为建模之外的一个因素来对待, 而我们则将它视为一个分解性分解性分解性, 我们提供第一个由数据驱动的、 不受监督的等级的分解分解法, 叫做 高级分化分化组组组(HSG), 以一致和统计共分解性分解性分解的任何层次方法, HSG 也超越了我们现有的五级的分解法。