The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with Hepos, we are able to process ten times more tokens than existing models that use full attentions. For evaluation, we present a new dataset, GovReport, with significantly longer documents and summaries. Results show that our models produce significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also shows that our models generate more informative summaries with fewer unfaithful errors.
翻译:大型变换器的二次计算和记忆复杂性限制了它们用于长期文档汇总的可缩放性。 在本文中,我们建议赫波斯( Hepos), 这是一种新型高效的编码器解码器(decoder)关注器, 其位置是头等高, 以有效确定源头的显著信息。 我们进一步系统研究现有的高效自用。 与赫波斯( Hepos) 一起, 我们能够处理比现有模型多十倍于充分关注的代号。 在评估中, 我们提出了一个新的数据集 Gov Report (Gov Report), 其文档和摘要要长得多。 结果显示我们的模型产生的ROUGE分数远远高于竞争性的比较, 包括PubMed( PubMed) 的新的最新技术结果。 人类评估还表明, 我们的模型生成的信息摘要比现有模型少出十倍于不真实错误。