Segmenting an image into its parts is a frequent preprocess for high-level vision tasks such as image editing. However, annotating masks for supervised training is expensive. Weakly-supervised and unsupervised methods exist, but they depend on the comparison of pairs of images, such as from multi-views, frames of videos, and image transformations of single images, which limits their applicability. To address this, we propose a GAN-based approach that generates images conditioned on latent masks, thereby alleviating full or weak annotations required in previous approaches. We show that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner on latent keypoints that define the position of parts explicitly. Without requiring supervision of masks or points, this strategy increases robustness to viewpoint and object positions changes. It also lets us generate image-mask pairs for training a segmentation network, which outperforms the state-of-the-art unsupervised segmentation methods on established benchmarks.
翻译:将图像分割成其部分是一个常见的高级视觉任务(如图像编辑)预选过程。 但是, 用于监管培训的口罩说明费用昂贵。 存在薄弱且不受监督的方法, 但这些方法取决于对图像的对比, 如多视图、 视频框架和单个图像的图像转换, 这限制了图像的可应用性。 为了解决这个问题, 我们提议了一种基于 GAN 的方法, 生成以潜在面罩为条件的图像, 从而减轻先前方法所要求的完整或微弱的注释。 我们显示, 在以分级方式将面罩设置在明确定义部件位置的潜在关键点上时, 可以忠实地学习。 不需要对口罩或点的监管, 这一战略可以增强查看和对象位置变化的稳健性。 它还让我们生成图像- 组合, 用于培训分区网络, 从而超越了既定基准上最先进的、 不受监督的分解方法 。