Recent Transformer-based summarization models have provided a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have two shortcomings: (1) they often perform poorly in content selection, and (2) their training strategy is not quite efficient, which restricts model performance. In this paper, we explore two orthogonal ways to compensate for these pitfalls. First, we augment the Transformer network with a sentence cross-attention module in the decoder, encouraging more abstraction of salient content. Second, we include a curriculum learning approach to reweight the training samples, bringing about an efficient learning procedure. Our second approach to enhance the training strategy of Transformers networks makes stronger gains as compared to the first approach. We apply our model on extreme summarization dataset of Reddit TIFU posts. We further look into three cross-domain summarization datasets (Webis-TLDR-17, CNN/DM, and XSum), measuring the efficacy of curriculum learning when applied in summarization. Moreover, a human evaluation is conducted to show the efficacy of the proposed method in terms of qualitative criteria, namely, fluency, informativeness, and overall quality.
翻译:最近以变异器为基础的总结模型为抽象归纳提供了一种很有希望的方法,它们超越了刑罚选择和采掘战略,以便处理更复杂的任务,如新颖的生成单词和句子参数,然而,这些模型有两个缺点:(1) 在内容选择方面往往表现不佳,(2) 其培训战略效率不高,这限制了模型性能。在本文件中,我们探索了两种正统方法来弥补这些缺陷。首先,我们增加了变异器网络,在解码器中增加了一个跨注意的句号模块,鼓励了突出内容的更抽象化。第二,我们增加了课程学习方法,对培训样本进行再加权,从而形成一个有效的学习程序。我们加强变异器网络培训战略的第二个方法与第一种方法相比,取得了更大的收益。我们应用了我们关于Reddit TIFU Post的极端加对称数据集的模型。我们进一步研究了三个交叉加对称数据集(Webis-TLDR-17、CNN/DM和XSum), 测量课程学习在总结中应用时的效果,即定性、全面质量评估方法。