Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior distribution over the discrete states must be trained separately. This prior is generally very complex and leads to slow generation. In this work, we propose a new model to train the prior and the encoder/decoder networks simultaneously. We build a diffusion bridge between a continuous coded vector and a non-informative prior distribution. The latent discrete states are then given as random functions of these continuous vectors. We show that our model is competitive with the autoregressive prior on the mini-Imagenet and CIFAR dataset and is efficient in both optimization and sampling. Our framework also extends the standard VQ-VAE and enables end-to-end training.
翻译:矢量量化自动编码器(VQ-VAE)是基于数据离散潜在显示的基因模型,其输入被映射成一组有限的已学嵌入。 要生成新样本, 就必须对离散状态的自动递增先前分布进行单独培训。 之前通常非常复杂, 并导致生成缓慢。 在这项工作中, 我们提出一个新的模型, 用于同时培训前方和编码器/ 解码器网络。 我们在连续编码的矢量和非信息化的先前分布之间建起一个扩散桥梁。 潜离散状态随后被作为这些连续矢量的随机函数。 我们显示, 我们的模型与微型 IMagenet 和 CIFAR 数据集的自动递增前方具有竞争力, 并且高效地优化和取样。 我们的框架还扩展了标准 VQ- VAE, 并且能够进行端对端培训 。