Generative models make huge progress to the photorealistic image synthesis in recent years. To enable human to steer the image generation process and customize the output, many works explore the interpretable dimensions of the latent space in GANs. Existing methods edit the attributes of the output image such as orientation or color scheme by varying the latent code along certain directions. However, these methods usually require additional human annotations for each pretrained model, and they mostly focus on editing global attributes. In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. Along with training the GAN model from scratch, these heatmaps are being aligned with the emerging attention of the GAN's discriminator in a self-supervised learning manner. During inference, human users can intuitively interact with the spatial heatmaps to edit the output image, such as varying the scene layout or moving objects in the scene. Extensive experiments show that the proposed method not only enables spatial editing over human faces, animal faces, outdoor scenes, and complicated indoor scenes, but also brings improvement in synthesis quality.
翻译:近年来,光现实图像合成模型取得了巨大的进步。为了让人类能够引导图像生成过程并定制输出,许多工作都探索GANs中潜在空间的可解释层面。 现有的方法通过在某些方向上改变潜在代码来编辑输出图像的属性, 如方向或颜色图案。 但是,这些方法通常需要为每个预先培训的模型增加人文说明, 并且主要侧重于编辑全球属性。 在这项工作中, 我们提议了一种自我监督的方法来改进GANs的空间可控性, 而不在潜在空间中寻找可导航的方向或需要额外的说明。 具体地说, 我们随机地将Gaussian热映射图作为空间诱导偏向性偏移入基因模型的中间层。 在培训GAN模型时, 这些热映射图通常与GAN的导师以自我超强的学习方式出现的新关注一致。 在推断过程中, 人类用户可以直觉地与空间热映射图进行互动, 以编辑输出图像, 如不同的场面布局或移动质量对象, 也能够让人类的图像进行空间演化。