Summarization has usually relied on gold standard summaries to train extractive or abstractive models. Social media brings a hurdle to summarization techniques since it requires addressing a multi-document multi-author approach. We address this challenging task by introducing a novel method that generates abstractive summaries of online news discussions. Our method extends a BERT-based architecture, including an attention encoding that fed comments' likes during the training stage. To train our model, we define a task which consists of reconstructing high impact comments based on popularity (likes). Accordingly, our model learns to summarize online discussions based on their most relevant comments. Our novel approach provides a summary that represents the most relevant aspects of a news item that users comment on, incorporating the social context as a source of information to summarize texts in online social networks. Our model is evaluated using ROUGE scores between the generated summary and each comment on the thread. Our model, including the social attention encoding, significantly outperforms both extractive and abstractive summarization methods based on such evaluation.
翻译:总结通常依靠黄金标准摘要来培训采掘或抽象模型。 社交媒体给总结技术带来障碍,因为它需要处理多文档多作者方法。 我们通过引入一种新颖的方法来应对这项具有挑战性的任务,即产生网上新闻讨论的抽象摘要。 我们的方法扩展了基于BERT的架构, 包括一个在培训阶段提供类似评论的注意编码。 为了培训我们的模型, 我们定义了一项任务, 包括重建基于受欢迎程度( 类似) 的高影响评论。 因此, 我们的模型学会根据最相关的评论来总结网上讨论。 我们的新方法提供了一份摘要, 代表了用户评论的新闻项目最相关的方面, 将社会背景作为信息来源, 用于汇总在线社交网络文本。 我们的模型使用生成摘要和对线索的每一项评论之间的ROUGE评分进行评估。 我们的模型, 包括社会关注编码, 大大超越了基于这种评价的采掘和抽象总结方法。