Although pre-trained language models (PLMs) have achieved great success and become a milestone in NLP, abstractive conversational summarization remains a challenging but less studied task. The difficulty lies in two aspects. One is the lack of large-scale conversational summary data. Another is that applying the existing pre-trained models to this task is tricky because of the structural dependence within the conversation and its informal expression, etc. In this work, we first build a large-scale (11M) pretraining dataset called RCS, based on the multi-person discussions in the Reddit community. We then present TANet, a thread-aware Transformer-based network. Unlike the existing pre-trained models that treat a conversation as a sequence of sentences, we argue that the inherent contextual dependency among the utterances plays an essential role in understanding the entire conversation and thus propose two new techniques to incorporate the structural information into our model. The first is thread-aware attention which is computed by taking into account the contextual dependency within utterances. Second, we apply thread prediction loss to predict the relations between utterances. We evaluate our model on four datasets of real conversations, covering types of meeting transcripts, customer-service records, and forum threads. Experimental results demonstrate that TANET achieves a new state-of-the-art in terms of both automatic evaluation and human judgment.
翻译:虽然事先培训的语言模式(PLM)取得了巨大成功并成为NLP的一个里程碑,但抽象的谈话总结仍是一项艰巨但研究较少的任务。困难在于两个方面:一是缺乏大规模对话摘要数据;二是将现有的预先培训模式用于这项任务,因为对话及其非正式表达等在结构上依赖性很强,因此将现有的预先培训模式应用于这一任务十分棘手。在这项工作中,我们首先根据Reddit社区中的多人讨论,建立一个称为RCS的大规模(11M)培训前数据集。然后,我们提出TANet,一个基于线性变异器的网络。与现有的将对话作为句子顺序处理的预先培训模式不同。我们争辩说,在理解整个对话及其非正式表达中,言词之间固有的内在背景依赖性作用至关重要,因此提出了将结构信息纳入我们模式的两种新技术。我们首先通过考虑到语义内的背景性依赖性来计算出一线性关注。第二,我们用线性预测损失来预测言词际关系。我们评估了四种以线性为主的TA变式的模型,我们评估了四种数据模式,即真实的服务器格式记录,并展示了真实的版本。