A deep generative model is characterized by a representation space, its distribution, and a neural network mapping the representation to a distribution over vectors in feature space. Common methods such as variational autoencoders (VAEs) apply variational inference for training the neural network, but optimizing these models is often non-trivial. The encoder adds to the complexity of the model and introduces an amortization gap and the quality of the variational approximation is usually unknown. Additionally, the balance of the loss terms of the objective function heavily influences performance. Therefore, we argue that it is worthwhile to investigate a much simpler approximation which finds representations and their distribution by maximizing the model likelihood via back-propagation. In this approach, there is no encoder, and we therefore call it a Deep Generative Decoder (DGD). Using the CIFAR10 data set, we show that the DGD is easier and faster to optimize than the VAE, achieves more consistent low reconstruction errors of test data, and alleviates the problem of balancing the reconstruction and distribution loss terms. Although the model in its simple form cannot compete with state-of-the-art image generation approaches, it obtains better image generation scores than the variational approach on the CIFAR10 data. We demonstrate on MNIST data how the use of a Gaussian mixture with priors can lead to a clear separation of classes in a 2D representation space, and how the DGD can be used with labels to obtain a supervised representation.
翻译:深度基因模型的特征是显示空间、其分布和神经网络,对特性空间中矢量分布的分布进行分布分布图; 通用方法,如变式自动电解码器(VAEs),对神经网络的培训采用变式推断,但优化这些模型往往是非三角的。 编码器增加了模型的复杂性,并引入了摊销差距和变差近似质量,这通常并不为人知。 此外,目标功能的损失条件的平衡性能严重影响到性能。 因此,我们认为,值得调查一个更简单得多的近似方法,通过反向调整最大限度地增加模型的可能性来找到其表达及其分布。 在这个方法中,没有编码器,因此我们称之为深基因解码器。 使用CIFAR10数据集,我们显示DGD比VAE更容易和更快地优化。 测试数据在重建与分配损失条件的平衡上更一致的差错。 尽管简单化模型的形式无法与模型的表达方式进行更好的分解,但是在使用之前的图像生成方法上,我们无法与之前的GAFAR10模型进行更好的对比。