Generative adversarial networks (GANs) have proven to be surprisingly efficient for image editing by inverting and manipulating the latent code corresponding to a natural image. This property emerges from the disentangled nature of the latent space. In this paper, we identify two geometric limitations of such latent space: (a) euclidean distances differ from image perceptual distance, and (b) disentanglement is not optimal and facial attribute separation using linear model is a limiting hypothesis. We thus propose a new method to learn a proxy latent representation using normalizing flows to remedy these limitations, and show that this leads to a more efficient space for face image editing.
翻译:生成的对抗性网络(GANs)已证明通过反转和操纵与自然图像相对应的潜在代码来对图像进行编辑是出乎意料的。 这种属性产生于潜伏空间的分解性质。 在本文中,我们确定了这种潜伏空间的两种几何限制:(a) euclidean距离不同于图像感知距离,和(b) 分解不是最佳的,使用线性模型对面部属性进行分离是一种限制性的假设。因此,我们提出了一个新方法来学习一种代理性潜在表达方式,利用正常化的流动来纠正这些限制,并表明这导致一个更高效的面部图像编辑空间。