Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.
翻译:通常使用自动递减基因模型,特别是在涉及相继数据的任务中,但是,由于链式有条件模型的内在特点(例如暴露偏差或缺乏远程一致性),这些模型本身就存在大量固有的缺陷,严重限制了它们适当模拟分布的能力;在本文件中,我们提出一种独特的方法,即E-ARM,用于培训自动递减基因模型,利用精心设计的基于能源的学习目标;通过利用软体操作的额外自由程度,我们允许自动递减模型本身成为一种以能源为基础的模型,用以衡量投入的可能性,而不引入任何额外参数;此外,我们表明,E-ARM可以受到有效培训,能够减轻暴露偏差问题,提高自动递减基因模型的时间一致性;广泛的实验结果,包括语言模型、神经机翻译和图像生成等基准,显示了拟议方法的有效性。