We present FusedGAN, a deep network for conditional image synthesis with controllable sampling of diverse images. Fidelity, diversity and controllable sampling are the main quality measures of a good image generation model. Most existing models are insufficient in all three aspects. The FusedGAN can perform controllable sampling of diverse images with very high fidelity. We argue that controllability can be achieved by disentangling the generation process into various stages. In contrast to stacked GANs, where multiple stages of GANs are trained separately with full supervision of labeled intermediate images, the FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike existing methods, which requires full supervision with paired conditions and images, the FusedGAN can effectively leverage more abundant images without corresponding conditions in training, to produce more diverse samples with high fidelity. We achieve this by fusing two generators: one for unconditional image generation, and the other for conditional image generation, where the two partly share a common latent space thereby disentangling the generation. We demonstrate the efficacy of the FusedGAN in fine grained image generation tasks such as text-to-image, and attribute-to-face generation.
翻译:我们展示了FuseGAN, 这是一种有条件图像合成的深网络, 并具有可控的多种图像抽样。 Fidility、多样性和可控的抽样是良好图像生成模型的主要质量衡量标准。 多数现有模型在所有三个方面都不够。 FuseGAN可以对不同图像进行可控抽样, 并且非常忠实。 我们主张, 将生成过程分离到不同阶段可以实现控制性。 与堆叠的 GAN 相比, 堆叠的GAN 的多个阶段在标记的中间图像的全面监督下单独培训, FuseGAN 拥有一个单级管道, 内嵌成一个GAN 。 与现有的方法不同, 它要求用配对的条件和图像进行全面监督, FuseGAN 可以在没有相应的培训条件下有效地利用更多的图像, 来生成更多样化的样本。 我们之所以能够做到这一点,是因为要用两个发电机: 一个是无条件生成图像,另一个是有条件生成图像, 其中两个部分共享共同的潜在空间, 从而使新一代失去。 我们展示了FuseGAN的效能, 将生成成像为文本。