Real-world image manipulation has achieved fantastic progress in recent years as a result of the exploration and utilization of GAN latent spaces. GAN inversion is the first step in this pipeline, which aims to map the real image to the latent code faithfully. Unfortunately, the majority of existing GAN inversion methods fail to meet at least one of the three requirements listed below: high reconstruction quality, editability, and fast inference. We present a novel two-phase strategy in this research that fits all requirements at the same time. In the first phase, we train an encoder to map the input image to StyleGAN2 $\mathcal{W}$-space, which was proven to have excellent editability but lower reconstruction quality. In the second phase, we supplement the reconstruction ability in the initial phase by leveraging a series of hypernetworks to recover the missing information during inversion. These two steps complement each other to yield high reconstruction quality thanks to the hypernetwork branch and excellent editability due to the inversion done in the $\mathcal{W}$-space. Our method is entirely encoder-based, resulting in extremely fast inference. Extensive experiments on two challenging datasets demonstrate the superiority of our method.
翻译:由于对GAN潜伏空间的探索和利用,近年来对真实世界图像的操纵取得了惊人的进展。 GAN 翻版是该管道的第一步,目的是忠实地将真实图像映射为潜伏代码。 不幸的是,现有的GAN 翻版方法大多未能至少满足以下三项要求中的其中一项要求:高重建质量、可编辑性和快速推断。 我们在这项研究中提出了一个新颖的两阶段战略,同时满足所有要求。 在第一阶段,我们训练了一个编码器,绘制SysteleGAN2$\mathcal{W}$-space的输入图像,该编码器被证明具有极好的可编辑性,但重建质量却较低。在第二阶段,我们利用一系列超网络来恢复在翻版过程中缺失的信息,从而补充了初始阶段的重建能力。这两个步骤相互补充,通过超网络分支和极佳的可编辑性(由于在$\mathcal{W}-space)的翻版能力,我们的方法完全以编码为基础,导致极具挑战性的数据实验。