Transformer-based models have achieved state-of-the-art performance on short-input summarization. However, they still struggle with summarizing longer text. In this paper, we present DYLE, a novel dynamic latent extraction approach for abstractive long-input summarization. DYLE jointly trains an extractor and a generator and treats the extracted text snippets as the latent variable, allowing dynamic snippet-level attention weights during decoding. To provide adequate supervision, we propose simple yet effective heuristics for oracle extraction as well as a consistency loss term, which encourages the extractor to approximate the averaged dynamic weights predicted by the generator. We evaluate our method on different long-document and long-dialogue summarization tasks: GovReport, QMSum, and arXiv. Experiment results show that DYLE outperforms all existing methods on GovReport and QMSum, with gains up to 6.1 ROUGE, while yielding strong results on arXiv. Further analysis shows that the proposed dynamic weights provide interpretability of our generation process.
翻译:以变换器为基础的模型在短期投入总和上取得了最先进的性能。 但是,它们仍然在努力总结较长的文本。 在本文中,我们展示了DYLE,这是用于抽象的长投入总和的新型动态潜在提取方法。DYLE联合培训了提取器和生成器,并将提取的文本片段作为潜在变量处理,允许在解码过程中使用动态片段的重力。为了提供充分的监督,我们建议对甲骨骼提取采用简单而有效的超常和一致性损失术语,该术语鼓励提取器接近生成器预测的平均动态重量。我们评估了我们在不同长文档和长对话总和任务上的方法:Gov Report、QMSum和arXiv。 实验结果表明,DYLE超越了Gov Report和QMSum的所有现有方法,其收益达到6.1 ROUGE,同时在ARXiv上产生了强有力的结果。进一步分析表明,拟议的动态加权提供了我们一代过程的可解释性。