Deep generative models like StyleGAN hold the promise of semantic image editing: modifying images by their content, rather than their pixel values. Unfortunately, working with arbitrary images requires inverting the StyleGAN generator, which has remained challenging so far. Existing inversion approaches obtain promising yet imperfect results, having to trade-off between reconstruction quality and downstream editability. To improve quality, these approaches must resort to various techniques that extend the model latent space after training. Taking a step back, we observe that these methods essentially all propose, in one way or another, to increase the number of free parameters. This suggests that inversion might be difficult because it is underconstrained. In this work, we address this directly and dramatically overparameterize the latent space, before training, with simple changes to the original StyleGAN architecture. Our overparameterization increases the available degrees of freedom, which in turn facilitates inversion. We show that this allows us to obtain near-perfect image reconstruction without the need for encoders nor for altering the latent space after training. Our approach also retains editability, which we demonstrate by realistically interpolating between images.
翻译:StyleGAN 等深层基因模型具有对语义图像编辑的希望: 以内容而不是像素值修改图像。 不幸的是, 使用任意图像需要倒置StyleGAN 生成器, 至今仍然具有挑战性。 现有的反演方法取得了有希望但不完善的结果, 必须在重建质量和下游可编辑性之间作出权衡。 为了提高质量, 这些方法必须采用各种技术, 在培训后扩展模型潜伏空间。 退一步后, 我们观察到这些方法基本上都提议以某种方式增加自由参数的数量。 这表明, 反演可能是困难的, 因为它没有受到控制。 在这项工作中, 我们直接和大大地将潜在空间过度地分解, 在培训前, 简单改变原StyGAN 结构。 我们的超常度增加了可用的自由度, 这反过来又会促进反演化。 我们显示, 这使得我们可以在不需要编码器或训练后改变潜在空间的情况下获得接近效果的图像重建。 我们的方法也保留了可编辑性, 我们通过对图像进行真实的内插来证明。