To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to their evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre trained model over nine summarisation datasets of diverse size and nature. The experimental results show a consistent improvement over the negative log-likelihood baselines.
翻译:迄今为止,大多数抽象总结模型都以否定日志相似性(NLL)的变体作为培训目标,在某些情况下,增加了强化学习,以培训与评估措施(如ROUGE)更接近的目标的模型(如ROUGE),但是,在强化学习方法范围内使用的奖励功能可以对业绩起到关键作用,并且仍然部分未探索。因此,我们在本文件中提议了抽象总结任务的两个奖励功能:第一个函数,称为RwB-Hinge,动态选择梯度更新的样本。第二个函数,称为RISK,利用少数强势候选人为奖励提供信息。在实验中,我们通过微调NLLL预先培训的模式,对不同大小和性质的九个汇总数据集进行微调。实验结果显示,与负日志相似性基线相比,持续改进。