We present a generic image-to-image translation framework, pixel2style2pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. Next, we propose utilizing our encoder to directly solve image-to-image translation tasks, defining them as encoding problems from some input domain into the latent domain. By deviating from the standard invert first, edit later methodology used with previous StyleGAN encoders, our approach can handle a variety of tasks even when the input image is not represented in the StyleGAN domain. We show that solving translation tasks through StyleGAN significantly simplifies the training process, as no adversary is required, has better support for solving tasks without pixel-to-pixel correspondence, and inherently supports multi-modal synthesis via the resampling of styles. Finally, we demonstrate the potential of our framework on a variety of facial image-to-image translation tasks, even when compared to state-of-the-art solutions designed specifically for a single task, and further show that it can be extended beyond the human facial domain.
翻译:我们提出了一个通用图像到图像翻译框架, pixel2stype2pixel (pSp) 。 我们的 PSp 框架基于一个新颖的编码器网络, 直接生成一系列样式矢量, 这些矢量被输入预先训练的 StyleGAN 生成器, 形成扩展的 W+ 潜在空间 。 我们首先显示, 我们的编码器可以直接将真实图像嵌入 W+, 无需额外优化。 接下来, 我们提议使用编码器直接解决图像到图像翻译任务, 将其定义为从某些输入域将问题编码到潜域。 通过先从标准反转, 编辑后与前的 StyleGAN 编码器使用的方法, 我们的方法可以处理各种任务, 即使输入图像图像没有在StyleGAN 域域域域域内进行代表, 我们显示, 通过StyleGAN 解决翻译任务, 无需额外优化。 我们建议使用编码器更好地支持解决任务, 没有像素到像素对像素的对等对等通信, 并且通过重印样式而内在支持多式合成合成合成合成。 最后, 我们的域框架在对面图像图像学上可以具体地显示单个任务进行翻译时, 我们的域框架可以显示, 向单一图像的单个化, 格式任务可以显示的单个化成为单个的单个化为单个的单个的单个图像解决方案。