The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers. These models are usually trained by maximizing the likelihood the output text sequence and assumes the input sequence and all gold preceding tokens are given during training, while during inference the model suffers from the exposure bias problem (i.e., it only has access to its previously predicted tokens rather gold tokens during beam search). In this paper, we propose MoCa ({\bf Mo}mentum {\bf Ca}libration) for text generation. MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search and MoCa learns to align its model scores of these samples with their actual qualities. Experiments on four text generation datasets (i.e., CNN/DailyMail, XSum, SAMSum and Gigaword) show MoCa consistently improves strong pre-trained transformers using vanilla fine-tuning and we achieve the state-of-the-art results on CNN/DailyMail and SAMSum datasets.
翻译:大多数文本生成任务的输入和输出可以转换成两个序列的质物,并且可以使用顺序到序列的学习模型工具(如变换器)来模拟它们。这些模型通常通过最大限度地增加输出文本序列的可能性来培训,并假设输入序列和在培训期间提供所有之前的金牌,而在推断过程中,模型会受到暴露偏差问题的影响(即,它只能接触先前预测的标物,而在光谱搜索中只能接触先前的标物,而不是金质标物)。在本文中,我们提议为文本生成建立MoCa(bf Moff Mo}um $bf Cauration) 。MoCa是一种在线方法,它能动态地生成缓慢变化(但一致)的样本,使用动动动力平均生成器进行波束搜索,MoCa学会将这些样本的模型分数与它们的实际质量相协调。在四个文本生成数据集(如CNN/DailyMail、XSum、SAMSum和Gigablof)上实验显示,MoCa 利用香草微调和SAM等数据结果不断改进强大的前变异器。