In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows. 1) We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable ``what-to-draw'' per step becomes a categorical random variable. This improves the expressiveness and greatly eases the learning compared to Gaussians used in the literature. 2) We model the sequential dependency structure of parts via a Transformer, which is more powerful and easier to train compared to RNNs used in the literature. 3) We propose an effective heuristic parsing algorithm to pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show that our method significantly outperforms previous structured image models like DRAW and AIR and is competitive to other generic generative models. Moreover, we show that our model's inherent compositionality and interpretability bring significant benefits in the low-data learning regime and latent space editing. Code is available at https://github.com/ZENGXH/NPDRAW.
翻译:在本文中,我们提出了一个非参数结构化的图像生成潜伏变量模型,称为NP-DRAW,它以部分方式从潜藏画布上逐个绘制,然后从画布中解码图像。我们的主要贡献如下:1)我们建议对图像部分的外观进行非参数先前分布,以使潜伏变量“什么拖动”成为绝对随机变量。这改善了表达性,大大便利了与文献中使用的Gaussian人相比的学习。(2)我们通过一个变异器对部分的相继依赖结构进行模拟,该变异器比文献中使用的RNNS更强大、更易于培训。(3)我们建议一种有效的超理论拼法算法,以预先绘制图像部分。对MNIST、Omniglott、CIFAR-10和CelebA的实验表明,我们的方法大大超越了像DRAW和AIR这样的先前结构化图像模型,并且对其他通用的基因化模型具有竞争力。此外,我们展示了我们的模型内在的构成和可解释性结构结构化结构与文献中使用的RNNNWAD/RED系统具有显著的好处。