Transformer-based models have achieved state-of-the-art performance on short text summarization. However, they still struggle with long-input summarization. In this paper, we present a new approach for long-input summarization: Dynamic Latent Extraction for Abstractive Summarization. We jointly train an extractor with an abstractor and treat the extracted text snippets as the latent variable. We propose extractive oracles to provide the extractor with a strong learning signal. We introduce consistency loss, which encourages the extractor to approximate the averaged dynamic weights predicted by the generator. We conduct extensive tests on two long-input summarization datasets, GovReport (document) and QMSum (dialogue). Our model significantly outperforms the current state-of-the-art, including a 6.21 ROUGE-2 improvement on GovReport and a 2.13 ROUGE-1 improvement on QMSum. Further analysis shows that the dynamic weights make our generation process highly interpretable. Our code will be publicly available upon publication.
翻译:以变压器为基础的模型在短文本总和上取得了最先进的性能。 但是,它们仍然与长期投入总和挣扎不休。 在本文中,我们提出了一个关于长期投入总和的新方法:动态中流提取用于抽象总结。我们用抽象器联合培训一个提取器,并将提取的文本片段作为潜在变量处理。我们建议采掘或触角为提取器提供强有力的学习信号。我们引入一致性损失,鼓励提取器接近发电机预测的平均动态重量。我们广泛测试了两个长期投入总和数据集,即Gov Report(文件)和QMSum(对话)。我们的模型大大超越了当前艺术状态,包括Gov Report的6.21 ROUGE-2改进和QMSum的2.13 ROUGE-1改进。进一步的分析表明,动态重量使我们的生成过程具有高度可解释性。我们的代码将在出版物上公开公布。