Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step. Previous work shows the efficacy of jointly scoring and selecting sentences with neural sequence generation models. It is, however, not well-understood if the gain is due to better encoding techniques or better redundancy reduction approaches. Similarly, the contribution of salience versus diversity components on the created summary is not studied well. Building on the state-of-the-art encoding methods for summarization, we present two adaptive learning models: AREDSUM-SEQ that jointly considers salience and novelty during sentence selection; and a two-step AREDSUM-CTX that scores salience first, then learns to balance salience and redundancy, enabling the measurement of the impact of each aspect. Empirical results on CNN/DailyMail and NYT50 datasets show that by modeling diversity explicitly in a separate step, AREDSUM-CTX achieves significantly better performance than AREDSUM-SEQ as well as state-of-the-art extractive summarization baselines.
翻译:同样,在所创建的概要中,突出与多样性部分的贡献没有得到很好的研究。根据最先进的总结编码方法,我们提出了两种适应性学习模式:AREDSUM-SEQ,在选择刑期时共同考虑突出和新颖之处;AREDSUM-CTX,在选择刑期时先分分分两步,然后分分清显著和冗余,从而能够衡量每个方面的影响。CNN/DailyMail和NYT50的实证结果显示,通过在一个单独的步骤中明确模拟多样性,AREDSUM-CTX取得了大大优于AREDSEM-SEQ的绩效,以及州一级提炼总基准。