The task of unsupervised semantic segmentation aims to cluster pixels into semantically meaningful groups. Specifically, pixels assigned to the same cluster should share high-level semantic properties like their object or part category. This paper presents MaskDistill: a novel framework for unsupervised semantic segmentation based on three key ideas. First, we advocate a data-driven strategy to generate object masks that serve as a pixel grouping prior for semantic segmentation. This approach omits handcrafted priors, which are often designed for specific scene compositions and limit the applicability of competing frameworks. Second, MaskDistill clusters the object masks to obtain pseudo-ground-truth for training an initial object segmentation model. Third, we leverage this model to filter out low-quality object masks. This strategy mitigates the noise in our pixel grouping prior and results in a clean collection of masks which we use to train a final segmentation model. By combining these components, we can considerably outperform previous works for unsupervised semantic segmentation on PASCAL (+11% mIoU) and COCO (+4% mask AP50). Interestingly, as opposed to existing approaches, our framework does not latch onto low-level image cues and is not limited to object-centric datasets. The code and models will be made available.
翻译:未经监督的语义分解任务旨在将像素分组成具有语义意义的组群。 具体地说, 分配给同一组群的像素类应该共享像其对象或部分类别这样的高层次语义属性。 本文展示了 MaskDistill: 一个基于三个关键理念的未经监督语义分解新框架。 首先, 我们倡导一种数据驱动策略, 生成对象面罩, 以作为语义分解之前的像素组合。 这种方法省略了手工艺前科, 这些前科通常是为特定景象构成设计的, 并限制了相竞框架的可适用性。 其次, MaskDistustill 组合对象面罩以获取虚假的地面图解, 用于培训初始对象分解模型模型。 第三, 我们利用这个模型过滤低质量对象面隔音分解。 这个策略可以缓解我们像素分组之前的噪音, 并导致一个清洁的遮罩收集, 用来训练最后的分解模式。 通过合并这些部件, 我们可以大大超越先前为 PASALL (+11% MIM) 和 Co- hil- crealalalal- 格式不是我们现有的 IP4U 。