Latent space exploration is a technique that discovers interpretable latent directions and manipulates latent codes to edit various attributes in images generated by generative adversarial networks (GANs). However, in previous work, spatial control is limited to simple transformations (e.g., translation and rotation), and it is laborious to identify appropriate latent directions and adjust their parameters. In this paper, we tackle the problem of editing the StyleGAN image layout by annotating the image directly. To do so, we propose an interactive framework for manipulating latent codes in accordance with the user inputs. In our framework, the user annotates a StyleGAN image with locations they want to move or not and specifies a movement direction by mouse dragging. From these user inputs and initial latent codes, our latent transformer based on a transformer encoder-decoder architecture estimates the output latent codes, which are fed to the StyleGAN generator to obtain a result image. To train our latent transformer, we utilize synthetic data and pseudo-user inputs generated by off-the-shelf StyleGAN and optical flow models, without manual supervision. Quantitative and qualitative evaluations demonstrate the effectiveness of our method over existing methods.
翻译:隐性空间探索是一种技术,它发现可解释的潜在方向,并操纵潜在代码,以编辑基因对抗网络(GANs)产生的图像中的各种属性。然而,在以往的工作中,空间控制仅限于简单的转换(例如翻译和旋转),而且很难确定适当的潜在方向并调整其参数。在本文中,我们通过直接说明图像来解决编辑StyleGAN图像布局的问题。为了这样做,我们提议了一个互动框架,根据用户投入来调控潜在代码。在我们的框架内,用户将StyleGAN图像与他们想要移动或不移动的位置进行注解,并指定鼠标拖动的移动方向。从这些用户输入和初始潜在代码中,我们基于变压器编码解码器结构的潜在变压器估算出输出潜在代码,这些代码被输入StyleGAN生成器以获取结果图像。为了培训我们的潜在变压器,我们使用现StyleGAN和光学流模型生成的合成数据和假用户输入。 定量和定性评估显示我们现有方法在不经过人工监督的情况下的有效性。