HyperStyle: 用于真实图像编辑的超网络转换样式GAN (HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing)

The inversion of real images into StyleGAN's latent space is a well-studied problem. Nevertheless, applying existing approaches to real-world scenarios remains an open challenge, due to an inherent trade-off between reconstruction and editability: latent space regions which can accurately represent real images typically suffer from degraded semantic control. Recent work proposes to mitigate this trade-off by fine-tuning the generator to add the target image to well-behaved, editable regions of the latent space. While promising, this fine-tuning scheme is impractical for prevalent use as it requires a lengthy training phase for each new image. In this work, we introduce this approach into the realm of encoder-based inversion. We propose HyperStyle, a hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space. A naive modulation approach would require training a hypernetwork with over three billion parameters. Through careful network design, we reduce this to be in line with existing encoders. HyperStyle yields reconstructions comparable to those of optimization techniques with the near real-time inference capabilities of encoders. Lastly, we demonstrate HyperStyle's effectiveness on several applications beyond the inversion task, including the editing of out-of-domain images which were never seen during training.

翻译：将真实图像转换到 StyleGAN 的潜层空间是一个研究周密的问题。然而,由于重建与可编辑之间的内在权衡,将现有方法应用于现实世界情景仍是一个公开的挑战:能够准确反映真实图像的潜在空间区域通常会受到退化的语义控制。最近的工作提议通过微调生成器,将目标图像添加到可编辑的潜层区域,从而减轻这一权衡。虽然这一微调计划很有希望,但对于普遍使用来说是不切实际的,因为它需要每个新图像都有一个漫长的培训阶段。在这项工作中,我们将这一方法引入基于编码器的反演领域。我们建议建立超标准空间区域,这是一个超网络,学会调整StyleGAN的权重,以忠实地表达在可编辑的潜层区域中的一种特定图像。一个天性调制方法需要培训一个拥有超过30亿个参数的超网络。通过仔细的网络设计,我们减少这种超标准,以便与现有的编码器相匹配。超标准将产生与最接近的优化技术的重建。超标准与近实时的图像应用能力相比, 最终显示我们所看到的任务转换能力。