Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.
翻译:在现实世界应用中成功采用深层基因化模型的关键要求之一,可控的生成是成功采用深层基因化模型的关键要求之一,但仍然是一个巨大的挑战。特别是,目前大多数模型都无法利用生成新概念组合的构成能力。在这项工作中,我们使用基于能源的模型来处理一组属性的合成生成。为了使这些模型能够伸缩到高分辨率图像生成中,我们在诸如SteleGAN这样的预先培训的基因化模型的潜在空间中引入了一种EBM。我们提出了一个新的EBM配方,它代表了数据和属性的共同分布,我们展示了从中提取的样本是如何形成一个普通差异方程式的。鉴于经过预先培训的生成者,我们所需要的所有可控生成模型都是为了培养一个属性分类师。在潜在空间中有效地进行对模型进行校准,并且对超分辨率计力。因此,我们的方法简单、快速、高效地进行取样。实验结果显示,我们的方法在有条件的取样和连续的图像编辑中都优于最先进的状态,我们所使用的方法就是通过现代的模型生成10的模型。