Generative Adversarial Networks (GANs) have been widely applied in modeling diverse image distributions. However, despite its impressive applications, the structure of the latent space in GANs largely remains as a black-box, leaving its controllable generation an open problem, especially when spurious correlations between different semantic attributes exist in the image distributions. To address this problem, previous methods typically learn linear directions or individual channels that control semantic attributes in the image space. However, they often suffer from imperfect disentanglement, or are unable to obtain multi-directional controls. In this work, in light of the above challenges, we propose a novel approach that discovers nonlinear controls, which enables multi-directional manipulation as well as effective disentanglement, based on gradient information in the learned GAN latent space. More specifically, we first learn interpolation directions by following the gradients from classification networks trained separately on the attributes, and then navigate the latent space by exclusively controlling channels activated for the target attribute in the learned directions. Empirically, with small training data, our approach is able to gain fine-grained controls over a diverse set of bi-directional and multi-directional attributes, and we showcase its ability to achieve disentanglement significantly better than state-of-the-art methods both qualitatively and quantitatively.
翻译:然而,尽管其应用给人留下深刻印象,但GAN的潜伏空间结构基本上仍是一个黑盒,因此其可控的生成成为一个开放的问题,特别是如果在图像分布中存在不同语义属性之间的虚假关联。为了解决这一问题,以往的方法通常学习在图像空间中控制语义属性的线性方向或单个渠道。然而,它们往往受到不完善的分解,或无法获得多方向控制。在这项工作中,鉴于上述挑战,我们建议采用新颖的方法,发现非线性控制,从而根据所学的GAN潜在空间中的梯度信息,能够进行多方向操纵和有效地分离。更具体地说,我们首先通过跟踪在图像空间中单独培训的分类网络的梯度来学习内插线性方向,然后通过完全控制为目标属性而启动的频道来绕过暗层空间。在使用小型培训数据时,我们的方法能够发现非线性控制非线性控制,这种控制能够进行多方向操纵,以及有效的分离性分离。我们的方法能够大大地获得比双向式双向的定性控制。