We propose a new approach to generate multiple variants of the target summary with diverse content and varying lengths, then score and select admissible ones according to users' needs. Abstractive summarizers trained on single reference summaries may struggle to produce outputs that achieve multiple desirable properties, i.e., capturing the most important information, being faithful to the original, grammatical and fluent. In this paper, we propose a two-staged strategy to generate a diverse set of candidate summaries from the source text in stage one, then score and select admissible ones in stage two. Importantly, our generator gives a precise control over the length of the summary, which is especially well-suited when space is limited. Our selectors are designed to predict the optimal summary length and put special emphasis on faithfulness to the original text. Both stages can be effectively trained, optimized and evaluated. Our experiments on benchmark summarization datasets suggest that this paradigm can achieve state-of-the-art performance.
翻译:我们提出了一种新办法,以产生不同内容和不同长度的目标摘要的多种变体,然后根据用户的需要进行评分和选择可接受的内容。在单一参考摘要方面受过培训的抽象摘要员可能很难产生出能够实现多种可取属性的产出,即获取最重要的信息,忠实于原始的、语法的和流利的信息。在本文件中,我们提出了一个分两阶段的战略,在第一阶段从源文本中产生一套不同的候选人摘要,然后在第二阶段进行评分和选择可接受。重要的是,我们的生成者对摘要的长度有精确的控制,在空间有限时特别适合。我们的选择者旨在预测最佳摘要长度,并特别强调对原始文本的忠实性。两个阶段都可以得到有效的培训、优化和评价。我们在基准组合数据集方面的实验表明,这一模式可以达到最先进的性能。