The high-quality images yielded by generative adversarial networks (GANs) have motivated investigations into their application for image editing. However, GANs are often limited in the control they provide for performing specific edits. One of the principal challenges is the entangled latent space of GANs, which is not directly suitable for performing independent and detailed edits. Recent editing methods allow for either controlled style edits or controlled semantic edits. In addition, methods that use semantic masks to edit images have difficulty preserving the identity and are unable to perform controlled style edits. We propose a method to disentangle a GAN$\text{'}$s latent space into semantic and style spaces, enabling controlled semantic and style edits for face images independently within the same framework. To achieve this, we design an encoder-decoder based network architecture ($S^2$-Flow), which incorporates two proposed inductive biases. We show the suitability of $S^2$-Flow quantitatively and qualitatively by performing various semantic and style edits.
翻译:由基因对抗网络(GANs)产生的高质量图像激发了对其图像编辑应用程序的调查。然而,GANs在为进行特定编辑提供的控制上往往受到限制。主要的挑战之一是GANs的缠绕潜藏空间,它不直接适合于独立和详细的编辑工作。最近的编辑方法允许受控样式编辑或受控语义编辑。此外,使用语义遮罩编辑图像的方法难以保存身份,无法进行受控样式编辑。我们提出一种方法,将GAN$\text{}的潜藏空间分解成语义和风格空间,使受控语义和风格编辑能够在同一个框架内独立进行。为了实现这一目标,我们设计了一个基于编码-解码的网络结构(S=2$-Flow),其中含有两个拟议的诱导偏差。我们通过进行各种语义和风格编辑,显示$S_2$-Flow的定量和定性是否合适。