Recent generative adversarial networks (GANs) are able to generate impressive photo-realistic images. However, controllable generation with GANs remains a challenging research problem. Achieving controllable generation requires semantically interpretable and disentangled factors of variation. It is challenging to achieve this goal using simple fixed distributions such as Gaussian distribution. Instead, we propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Self-training provides an iterative feedback in the GAN training, from the discriminator to the generator, and progressively improves the proposal of the latent codes as training proceeds. The latent codes are sampled from a latent variable model that is learned in the feature space of the discriminator. We consider a normalized independent component analysis model and learn its parameters through tensor factorization of the higher-order moments. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder, and is able to discover semantically meaningful latent codes without any supervision. We demonstrate empirically on both cars and faces datasets that each group of elements in the learned code controls a mode of variation with a semantic meaning, e.g. pose or background change. We also demonstrate with quantitative metrics that our method generates better results compared to other approaches.
翻译:最近的基因对抗网络(GANs)能够生成令人印象深刻的摄影现实图像。然而,与GANs相控的生成仍是一个具有挑战性的研究问题。实现可控的生成需要精解解释和分解的变异因素。使用简单的固定分布(如高森分布)来实现这一目标具有挑战性。相反,我们提议了一个不受监督的框架来学习通过自我培训来控制生成器的潜在代码的分布。自我培训在GAN培训中提供了从制导器到生成器的迭接反馈,并在培训过程中逐步改进潜在代码的建议。潜在代码是从在制导器特征空间学习的潜在变量模型中取样的。我们考虑一个标准化的独立组件分析模型,并通过高阶时段的微分化来了解其参数。我们的框架比其他变量(如变异自动电解码)更能解动,并且能够在没有任何监督的情况下发现具有意义的隐性潜值代码。我们在汽车和对潜在代码的配置上都进行了实验性地展示了数据设置,而每个组合都能够将各种变量的变量与模型的模型进行比较。