Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024. Code is available at https://github.com/NVlabs/LACE.
翻译:控制型的生成是成功在现实世界应用中采用深层基因化模型的关键要求之一,但它仍然是一项巨大的挑战。特别是,大多数当前模型都不具备生成新概念组合的构成能力。在这项工作中,我们使用基于能源的模型来处理一组属性的合成生成。为了使这些模型可扩缩到高分辨率图像生成中,我们在诸如StelegGAN等预先培训的基因化模型的潜在空间中引入了一种EBM。我们提出了一个新的EBM配方,它代表了数据和属性的共同分布,我们展示了从中提取的样本是如何形成一种普通差异方程式(ODE)的。在经过预先培训的生成过程中,我们所需要的所有可控生成模型都是为了培养一个属性分类师。在潜藏空间中,使用数字模型进行取样是有效的,因此,我们的方法简单、快速地培训,并且高效地取样。实验结果显示,我们的方法在有条件的取样/连续版图像编辑中都优于最新版的状态。此外,在经过事先培训/连续编辑的图像中,我们所需的可控生成的模型的生成功能是生成的10 。同时,在可理解的逻辑生成的模型生成中,我们的方法是生成的逻辑生成的10 。