Despite recent advances in semantic manipulation using StyleGAN, semantic editing of real faces remains challenging. The gap between the $W$ space and the $W$+ space demands an undesirable trade-off between reconstruction quality and editing quality. To solve this problem, we propose to expand the latent space by replacing fully-connected layers in the StyleGAN's mapping network with attention-based transformers. This simple and effective technique integrates the aforementioned two spaces and transforms them into one new latent space called $W$++. Our modified StyleGAN maintains the state-of-the-art generation quality of the original StyleGAN with moderately better diversity. But more importantly, the proposed $W$++ space achieves superior performance in both reconstruction quality and editing quality. Despite these significant advantages, our $W$++ space supports existing inversion algorithms and editing methods with only negligible modifications thanks to its structural similarity with the $W/W$+ space. Extensive experiments on the FFHQ dataset prove that our proposed $W$++ space is evidently more preferable than the previous $W/W$+ space for real face editing. The code is publicly available for research purposes at https://github.com/AnonSubm2021/TransStyleGAN.
翻译:尽管最近在使用StyleGAN进行语义操作方面取得了进展,但真实面貌的语义编辑仍然具有挑战性。W$空间与W$+空间之间的差距要求重建质量和编辑质量之间作出不可取的权衡。为了解决这个问题,我们建议扩大潜在空间,用关注型变压器取代StyleGAN绘图网络中完全连接的层层。这一简单而有效的技术将上述两个空间整合起来,并将其转化为一个新的潜在空间,称为W$++。我们修改后的StyleGAN保持原StyleGAN最先进的新一代质量,其多样性稍稍好。但更重要的是,拟议的W+美元空间在重建质量和编辑质量两方面都取得了优异性。尽管有这些重大优势,但我们的W$++空间仍然支持现有的转换算法和编辑方法,但由于与$W/W+美元+空间的结构相似,这种简单有效的技术将其转化成。在FFHQ数据集上进行的广泛实验证明,我们提议的$1/W++空间显然比先前的美元/W$+空间更可取。但更重要的是,拟议的空间在重建质量和编辑质量上都可公开用于研究。