The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space.However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and editing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying network, and see great generalizability towards unseen manipulation types and out-of-domain images.
翻译:StyleGAN 家族成功生成了高异性图像,并允许通过操纵语义丰富的潜质风格空间对生成的图像进行灵活和可信的编辑。 然而,将真实图像投射到其潜质空间中,在反向质量和可编辑性之间会遇到内在的权衡。 现有的基于编码器或基于优化的StyleGAN 的反向方法试图减轻偏差,但看到有限的性能。 为了从根本上解决这个问题,我们提出了一个新型的两阶段框架,即指定两个不同的网络分别处理编辑和重建问题,而不是平衡两者。 具体而言,在第一阶段,W-空间导向StyleGAN 的反向网络经过培训,用于进行图像反向和编辑,这确保了可编辑性,但牺牲了重建的质量。 在第二阶段,一个精心设计的校正网络被用来纠正反偏差并进行理想的重建。 实验结果显示,我们的方法在不牺牲可编辑性的情况下产生了近乎效果的重建,从而允许精确地操纵真实图像。 此外,我们评估了我们的校正网络的性,并且看到对隐性操纵类型和外部图像的巨大一般性。