Conditional image generation (CIG) is a widely studied problem in computer vision and machine learning. Given a class, CIG takes the name of this class as input and generates a set of images that belong to this class. In existing CIG works, for different classes, their corresponding images are generated independently, without considering the relationship among classes. In real-world applications, the classes are organized into a hierarchy and their hierarchical relationships are informative for generating high-fidelity images. In this paper, we aim to leverage the class hierarchy for conditional image generation. We propose two ways of incorporating class hierarchy: prior control and post constraint. In prior control, we first encode the class hierarchy, then feed it as a prior into the conditional generator to generate images. In post constraint, after the images are generated, we measure their consistency with the class hierarchy and use the consistency score to guide the training of the generator. Based on these two ideas, we propose a TreeGAN model which consists of three modules: (1) a class hierarchy encoder (CHE) which takes the hierarchical structure of classes and their textual names as inputs and learns an embedding for each class; the embedding captures the hierarchical relationship among classes; (2) a conditional image generator (CIG) which takes the CHE-generated embedding of a class as input and generates a set of images belonging to this class; (3) a consistency checker which performs hierarchical classification on the generated images and checks whether the generated images are compatible with the class hierarchy; the consistency score is used to guide the CIG to generate hierarchy-compatible images. Experiments on various datasets demonstrate the effectiveness of our method.
翻译:有条件图像生成( CIG) 是计算机视觉和机器学习中广泛研究的一个问题 。 在一个类中, CIG 将这一类的名称作为输入, 并生成属于此类的一组图像 。 在现有的 CIG 中, 不同的类中, 相应的图像是独立生成的, 不考虑各类之间的关系 。 在现实世界应用程序中, 类分为等级, 它们的等级关系为生成高信仰图像提供了信息 。 在本文中, 我们的目标是利用等级等级等级结构来生成有条件图像 。 我们提出了两种整合等级等级的方法 : 先是控制, 后是生成该等级结构的等级结构, 然后作为先输入到有条件的生成者生成图像。 在后, 我们测量它们与等级结构的一致性, 并使用一致性评分来指导生成高信仰图像 。 基于这两个想法, 我们提议一个由三个模块组成的 :(1) 等级等级编码( CHE), 将课程的等级结构结构及其文字名称作为输入, 并学习每类中等级等级等级结构的等级结构, 以生成C 。 将数据生成的等级排序的等级排序的等级排序的等级排序的等级排序排序排序排序的排序排序排序排序排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序的排序显示为递制, 。