This paper addresses the problem of finding interpretable directions in the latent space of pre-trained Generative Adversarial Networks (GANs) to facilitate controllable image synthesis. Such interpretable directions correspond to transformations that can affect both the style and geometry of the synthetic images. However, existing approaches that utilise linear techniques to find these transformations often fail to provide an intuitive way to separate these two sources of variation. To address this, we propose to a) perform a multilinear decomposition of the tensor of intermediate representations, and b) use a tensor-based regression to map directions found using this decomposition to the latent space. Our scheme allows for both linear edits corresponding to the individual modes of the tensor, and non-linear ones that model the multiplicative interactions between them. We show experimentally that we can utilise the former to better separate style- from geometry-based transformations, and the latter to generate an extended set of possible transformations in comparison to prior works. We demonstrate our approach's efficacy both quantitatively and qualitatively compared to the current state-of-the-art.
翻译:本文探讨了在经过培训的基因反转网络(GANs)潜在空间寻找可解释的方向以便利可控图像合成的问题。这些可解释的方向与既影响合成图像的风格又影响其几何的转换相对应。然而,使用线性技术寻找这些变异的现有方法往往无法提供一种直观的方法来区分这两种变异来源。为了解决这个问题,我们提议(a) 对中间表达面的抗拉进行多线性分解,以及(b) 使用以数以万位为基础的回归法来绘制使用这种分解到潜在空间的映射方向。我们的计划允许两种线性编辑,两者都对应于发声的单个模式,以及模拟这些变异性互动的非线性模式。我们实验性地显示,我们可以利用前者更好地区分基于几何的变异性,而后者则比先前的工程产生一系列扩展的可能变异性。我们展示了我们的方法在数量上和质量上与当前艺术状态相比的有效性。