Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization. Typically these systems are trained by fine-tuning a large pre-trained model to the target task. One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows. Thus, for long document summarization, it can be challenging to train or fine-tune these models. In this work, we exploit large pre-trained transformer-based models and address long-span dependencies in abstractive summarization using two methods: local self-attention; and explicit content selection. These approaches are compared on a range of network configurations. Experiments are carried out on standard long-span summarization tasks, including Spotify Podcast, arXiv, and PubMed datasets. We demonstrate that by combining these methods, we can achieve state-of-the-art results on all three tasks in the ROUGE scores. Moreover, without a large-scale GPU card, our approach can achieve comparable or better results than existing approaches.
翻译:以变压器为基础的模型在包括文件摘要化在内的各种自然语言处理(NLP)任务中取得了最先进的结果。这些系统通常通过微调一个大型预先培训的模型对目标任务进行培训。这些变压器模型的一个问题是,这些模型在记忆和计算要求方面规模不高,而随着输入长度的提高,这些模型的记忆和计算要求也不同。因此,对于长期的文件总和来说,培训或微调这些模型可能具有挑战性。在这项工作中,我们利用了大型预先培训的变压器模型,并用两种方法来解决抽象式总和的长期依赖性:当地自省;和明确的内容选择。这些方法比较了网络配置的范围。实验是在标准长宽的拼凑任务上进行的,包括Spodificast、arXiv和PubMed数据集。我们证明,通过结合这些方法,我们能够在ROUGEE分数的所有三项任务中实现最先进的结果。此外,没有大规模GPUP卡,我们的方法可以比现有的结果更好。