Prior studies on text-to-text generation typically assume that the model could figure out what to attend to in the input and what to include in the output via seq2seq learning, with only the parallel training data and no additional guidance. However, it remains unclear whether current models can preserve important concepts in the source input, as seq2seq learning does not have explicit focus on the concepts and commonly used evaluation metrics also treat concepts equally important as other tokens. In this paper, we present a systematic analysis that studies whether current seq2seq models, especially pre-trained language models, are good enough for preserving important input concepts and to what extent explicitly guiding generation with the concepts as lexical constraints is beneficial. We answer the above questions by conducting extensive analytical experiments on four representative text-to-text generation tasks. Based on the observations, we then propose a simple yet effective framework to automatically extract, denoise, and enforce important input concepts as lexical constraints. This new method performs comparably or better than its unconstrained counterpart on automatic metrics, demonstrates higher coverage for concept preservation, and receives better ratings in the human evaluation. Our code is available at https://github.com/morningmoni/EDE.
翻译:以往关于文本到文本生成的研究通常假定,模型可以找出投入中应注意什么,以及通过后续学习将哪些内容纳入产出,只有平行的培训数据,而没有附加指导,但是,目前模式能否保留源投入中的重要概念仍然不清楚,因为后续学习没有明确侧重于概念和常用的评价指标,将概念与其他象征同等重要。在本文件中,我们提出系统分析,研究目前的后续2类模式,特别是预先培训的语言模式,是否足以保存重要的投入概念,以及在何种程度上明确指导概念的生成,因为词汇限制是有益的。我们回答上述问题,对具有代表性的四种文本到文本生成任务进行广泛的分析实验。然后,根据观察结果,我们提出一个简单而有效的框架,自动提取、默认和执行重要的投入概念,作为词汇限制。这种新方法比自动计量方面未受限制的对应方法具有可比较性或更好性,显示概念保存的覆盖面更高,并在人类评价中获得更好的评级。我们的代码可在 httpsning/givorom.com查阅 https.gin/EDD。