Nowadays, pre-trained sequence-to-sequence models such as BERTSUM and BART have shown state-of-the-art results in abstractive summarization. In these models, during fine-tuning, the encoder transforms sentences to context vectors in the latent space and the decoder learns the summary generation task based on the context vectors. In our approach, we consider two clusters of salient and non-salient context vectors, using which the decoder can attend more to salient context vectors for summary generation. For this, we propose a novel clustering transformer layer between the encoder and the decoder, which first generates two clusters of salient and non-salient vectors, and then normalizes and shrinks the clusters to make them apart in the latent space. Our experimental result shows that the proposed model outperforms the existing BART model by learning these distinct cluster patterns, improving up to 4% in ROUGE and 0.3% in BERTScore on average in CNN/DailyMail and XSUM data sets.
翻译:目前,如BERTSUM和BART等经过事先训练的序列到序列模型显示,最先进的技术效果是抽象的总结。在这些模型中,在微调过程中,编码器将句子转换成潜空的上下文矢量,而解码器则根据上下文矢量学习摘要生成任务。在我们的方法中,我们考虑到两种突出和非高度背景矢量的组群,使用这些组群,解码器可以更多地关注用于摘要生成的突出环境矢量。为此,我们提议在编码器和解码器之间建立一个新型的集群变异器层,首先生成两组突出和非关键矢量,然后对组进行正常化和压缩,使它们在潜空中分离。我们的实验结果表明,拟议的模型通过学习这些不同的群集模式,将CNN/DailyMail和XSUM数据集的平均比例提高到4%和0.3%,从而超越了现有的BART模型。