In this paper, we introduce a new method for generating an object image from text attributes on a desired location, when the base image is given. One step further to the existing studies on text-to-image generation mainly focusing on the object appearance, the proposed method aims to generate the object images as well as to preserve the given background information, which is the first attempt in this field. To tackle the problem, we propose a multi-conditional GAN (MC-GAN) which controls both the object and background information jointly. As a core component of MC-GAN, we proposes a synthesis block which disentangles the object and background information in the training stage. This block enables MC-GAN to generate a realistic object image with the desired background by controlling the amount of the background information from the given base image through the foreground information from the text attributes. From the experiments with Caltech-200 bird and Oxford-102 flower datasets, we show that our model is able to generate photo-realistic images with a resolution of 128 x 128. The source code of MC-GAN is available soon.
翻译:在本文中, 当给定基本图像时, 我们采用一种新的方法, 在一个想要的位置上用文字属性生成对象图像。 离目前主要侧重于对象外观的文本到图像生成研究更进一步, 拟议的方法旨在生成对象图像, 并保存给定的背景资料, 这是这个领域的第一次尝试。 为了解决这个问题, 我们提议了一个多条件的GAN( MC- GAN), 共同控制对象和背景资料。 作为 MC- GAN 的核心组成部分, 我们提议了一个合成块, 将对象和背景资料在培训阶段分解。 这个块使 MC- GAN 能够通过文本属性的地面信息, 控制从给定基本图像中获取的背景资料数量, 从而生成符合预期背景的现实对象图像 。 我们从与 Calte- 200 鸟和 Ox- 102 花数据集的实验中, 显示我们的模型能够以128 x 128. 分辨率生成照片现实图像 。 MC- GAN 源码很快可用 。