Curating datasets for object segmentation is a difficult task. With the advent of large-scale pre-trained generative models, conditional image generation has been given a significant boost in result quality and ease of use. In this paper, we present a novel method that enables the generation of general foreground-background segmentation models from simple textual descriptions, without requiring segmentation labels. We leverage and explore pre-trained latent diffusion models, to automatically generate weak segmentation masks for concepts and objects. The masks are then used to fine-tune the diffusion model on an inpainting task, which enables fine-grained removal of the object, while at the same time providing a synthetic foreground and background dataset. We demonstrate that using this method beats previous methods in both discriminative and generative performance and closes the gap with fully supervised training while requiring no pixel-wise object labels. We show results on the task of segmenting four different objects (humans, dogs, cars, birds).
翻译:用于对象分割的曲线数据集是一项艰巨的任务。 随着大规模经过预先训练的基因化模型的出现, 有条件的图像生成被大大提升了结果质量和使用方便度。 在本文中, 我们提出了一个新颖的方法, 能够从简单的文字描述中生成一般的地表- 地表分解模型, 而不需要分解标签 。 我们利用和探索预先训练的潜伏扩散模型, 自动生成对概念和对象的微弱分解遮罩 。 然后, 面具被用来微调一个油漆化任务中的扩散模型, 从而能够精细地清除对象, 同时提供合成的地表和背景数据集 。 我们证明, 使用这种方法在区别性功能和基因化性能两方面都比以往的方法强, 并用完全受监督的培训来弥补差距, 而不需要像素一样的物体标签 。 我们展示了四个不同对象( 人类、 狗、 汽车、 鸟类) 的分解任务的结果 。