Abstractive dialogue summarization has received increasing attention recently. Despite the fact that most of the current dialogue summarization systems are trained to maximize the likelihood of human-written summaries and have achieved significant results, there is still a huge gap in generating high-quality summaries as determined by humans, such as coherence and faithfulness, partly due to the misalignment in maximizing a single human-written summary. To this end, we propose to incorporate different levels of human feedback into the training process. This will enable us to guide the models to capture the behaviors humans care about for summaries. Specifically, we ask humans to highlight the salient information to be included in summaries to provide the local feedback , and to make overall comparisons among summaries in terms of coherence, accuracy, coverage, concise and overall quality, as the global feedback. We then combine both local and global feedback to fine-tune the dialog summarization policy with Reinforcement Learning. Experiments conducted on multiple datasets demonstrate the effectiveness and generalization of our methods over the state-of-the-art supervised baselines, especially in terms of human judgments.
翻译:尽管目前大多数对话总结系统都经过培训,以尽量扩大编写人文摘要的可能性,并取得了重大成果,但在产生由人决定的高质量摘要方面仍然存在巨大差距,例如一致性和忠诚性,部分原因是在最大限度地编写单一人文摘要方面出现不一致。为此目的,我们提议将不同层次的人类反馈纳入培训进程。这将使我们能够指导模型,以掌握人类关心摘要的行为。具体地说,我们请人突出摘要中应包含的突出信息,以提供当地反馈,并在作为全球反馈的连贯性、准确性、覆盖面、简明性和总体质量方面对摘要进行总体比较。然后,我们将地方和全球反馈结合起来,将对话总结政策与加强学习结合起来。对多个数据集进行的实验表明我们处理最先进的基线的方法的有效性和普遍性,特别是在人类判断方面。