We introduce a method that allows to automatically segment images into semantically meaningful regions without human supervision. Derived regions are consistent across different images and coincide with human-defined semantic classes on some datasets. In cases where semantic regions might be hard for human to define and consistently label, our method is still able to find meaningful and consistent semantic classes. In our work, we use pretrained StyleGAN2~\cite{karras2020analyzing} generative model: clustering in the feature space of the generative model allows to discover semantic classes. Once classes are discovered, a synthetic dataset with generated images and corresponding segmentation masks can be created. After that a segmentation model is trained on the synthetic dataset and is able to generalize to real images. Additionally, by using CLIP~\cite{radford2021learning} we are able to use prompts defined in a natural language to discover some desired semantic classes. We test our method on publicly available datasets and show state-of-the-art results.
翻译:我们引入了一种方法, 允许不经人类监督将部分图像自动分解到具有语义意义的区域。 衍生区域在不同图像之间是一致的, 并且与某些数据集中人类定义的语义类相吻合。 如果语义区对人类来说可能很难定义和一致标签, 我们的方法仍然能够找到有意义和一致的语义类。 在我们的工作中, 我们使用预先培训的SteleGAN2 ⁇ cite{karras202020分析} 基因模型: 在基因模型的特征空间中集成可以发现语义类。 一旦发现各个类, 就可以创建一个合成数据集, 配有生成的图像和相应的分解面面。 在对合成数据集进行培训之后, 能够对真实图像进行概括化。 此外, 我们还可以使用CLIP}{cite{radford2021学习} 来使用自然语言定义的提示来发现某些语义类。 我们用公开的数据集测试我们的方法, 并展示状态结果 。