Current state-of-the-art summarization models are trained with either maximum likelihood estimation (MLE) or reinforcement learning (RL). In this study, we investigate the third training paradigm and argue that inverse reinforcement learning (IRL) may be more suitable for text summarization. IRL focuses on estimating the reward function of an agent, given a set of observations of that agent's behavior. Generally, IRL provides advantages in situations where the reward function is not explicitly known or where it is difficult to define or interact with the environment directly. These situations are exactly what we observe in summarization. Thus, we introduce inverse reinforcement learning into text summarization and define a suite of sub-rewards that are important for summarization optimization. By simultaneously estimating the reward function and optimizing the summarization agent with expert demonstrations, we show that the model trained with IRL produces summaries that closely follow human behavior, in terms of better ROUGE, coverage, novelty, compression ratio and factuality when compared to the baselines trained with MLE and RL.
翻译:目前最先进的总结模型经过最有可能估计(MLE)或强化学习(RL)的培训。在本研究中,我们调查了第三个培训模式,认为反向强化学习(IRL)可能更适合文本总结。根据对该代理人行为的一套观察,IRL侧重于估计该代理人的奖励功能。一般而言,IRL在奖励功能不明确或难以直接界定或与环境互动的情况下提供优势。这些情况正是我们在总结中观察到的情况。因此,我们引入了对文本总结的反强化学习,并界定了一套对总结优化至关重要的次级奖励。同时通过估算奖励功能和优化与专家演示的合成剂,我们表明,与IRL培训的模型在更好的ROUGE、覆盖面、新颖性、压缩率和事实质量方面密切遵循人类行为的概要,与ML和RL所培训的基线相比,我们发现该模型在更好的ROUGE、覆盖范围、新颖性、压缩率和事实质量方面。