GAN inversion and editing via StyleGAN maps an input image into the embedding spaces ($\mathcal{W}$, $\mathcal{W^+}$, and $\mathcal{F}$) to simultaneously maintain image fidelity and meaningful manipulation. From latent space $\mathcal{W}$ to extended latent space $\mathcal{W^+}$ to feature space $\mathcal{F}$ in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore $\mathcal{W^+}$ and $\mathcal{F}$ rather than $\mathcal{W}$ to improve reconstruction fidelity while maintaining editability. As $\mathcal{W^+}$ and $\mathcal{F}$ are derived from $\mathcal{W}$ that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on $\mathcal{W^+}$ and $\mathcal{F}$ spaces could be improved by stepping back to $\mathcal{W}$. In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$. We introduce contrastive learning to align $\mathcal{W}$ and the image space for precise latent code discovery. %The obtaining process is by using contrastive learning to align $\mathcal{W}$ and the image space. Then, we leverage a cross-attention encoder to transform the obtained latent code in $\mathcal{W}$ into $\mathcal{W^+}$ and $\mathcal{F}$, accordingly. Our experiments show that our exploration of the foundation latent space $\mathcal{W}$ improves the representation ability of latent codes in $\mathcal{W^+}$ and features in $\mathcal{F}$, which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: \url{https://github.com/KumapowerLIU/CLCAE}.
翻译:通过 StyleGAN 绘制一个输入图像 嵌入空间 ($\ mathcal{W}$,$\mathcal{W}$和$\mathcal{F}$) 以同时保持图像忠度和有意义的操纵。从潜伏空间 $\mathcal{W}美元到扩展潜伏空间 $\mathcal{W}$,GAN 的可编辑性下降,而其重建质量则上升。最近的GAN 转换方法通常会探索$\macal{W} 美元和$macal_macal} 美元,而不是$\mathcal{F} 美元,以同时保持图像忠真度和有意义的操纵。从 $mathcalcal{wca{W} 美元到扩展潜值空间标准 美元,这些GAN 的转换方法以美元为基底基数 美元为基数,这些基数 以美元和美元为基数 美元和基数 美元为基数 。