Variational autoencoders have been widely applied for natural language generation, however, there are two long-standing problems: information under-representation and posterior collapse. The former arises from the fact that only the last hidden state from the encoder is transformed to the latent space, which is insufficient to summarize data. The latter comes as a result of the imbalanced scale between the reconstruction loss and the KL divergence in the objective function. To tackle these issues, in this paper we propose the discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages. Our approach is combined with an auto-regressive prior to capture the sequential dependency from observations, which can enhance the latent space for language generation. Moreover, thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse. Furthermore, we carefully analyze the superiority of discrete latent space over the continuous space with the common Gaussian distribution. Extensive experiments on language generation demonstrate superior advantages of our proposed approach in comparison with the state-of-the-art counterparts.
翻译:然而,在自然语言的生成中,广泛应用了变化式自动编码器,但有两个长期存在的问题:信息代表不足和后遗症,前者产生于以下事实:只有编码器中最后的隐藏状态转换为潜在空间,这不足以总结数据,后者是由于重建损失与目标功能KL差异之间的比例失衡造成的。为了解决这些问题,我们在本文件中提议了离散式不同关注模式,由于语言的离散性性质,对关注机制的绝对分布。我们的方法与自动递增性相结合,先从观察中获取连续依赖性,这可以增强语言生成的潜在空间。此外,由于离散性特性的特性,我们拟议方法的培训不会因离散性崩溃而受到影响。此外,我们仔细分析离散潜伏空间相对于连续空间的优越性,共同分配高斯语。关于语言生成的广泛实验表明,我们拟议方法与最先进的对应方相比,优优胜优势。