通过Query-Focispt 采掘以突出显示的汇总 (Aspect-Oriented Summarization through Query-Focused Extraction)

A reader interested in a particular topic might be interested in summarizing documents on that subject with a particular focus, rather than simply seeing generic summaries produced by most summarization systems. While query-focused summarization has been explored in prior work, this is often approached from the standpoint of document-specific questions or on synthetic data. Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries. In this paper, we collect a dataset of realistic aspect-oriented test cases, AspectNews, which covers different subtopics about articles in news sub-domains. We then investigate how query-focused methods, for which we can construct synthetic data, can handle this aspect-oriented setting: we benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model. We evaluate on two aspect-oriented datasets and find this approach yields (a) focused summaries, better than those from a generic summarization system, which go beyond simple keyword matching; (b) a system sensitive to the choice of keywords.

翻译：对某一专题感兴趣的读者可能有兴趣以特定重点来总结关于这一主题的文件,而不是仅仅看到多数总结系统所编写的一般摘要。虽然以前的工作已经探讨了以询问为重点的总结,但通常从文件特定问题的角度或合成数据的角度来探讨这个问题。真正的用户的需要往往更紧密地涉及各个方面,在一个数据集中,用户感兴趣的是广泛的主题,而不是具体的查询。在本文件中,我们收集了一套现实的、面向方方面面的测试案例的数据集,即AspectNews,它涵盖了关于新闻分域文章的不同分专题。我们接着调查了以询问为重点的方法(我们可以为此建立合成数据)如何能够处理这种面向方方面面的设置:我们为以采掘为重点的培训计划制定基准,并提出一种对比性强化方法来培训模型。我们评估了两个面向方方面面的数据集,发现这种方法产生:(a) 重点摘要,比一般概括系统的摘要更好,它们超出了简单的关键词匹配;(b) 系统对关键词的选择敏感。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

无监督学习：深度生成模型，35页ppt

专知会员服务

42+阅读 · 2021年7月4日

自然场景文本检测与识别中的深度学习方法综述

专知会员服务

47+阅读 · 2021年3月19日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【SIGGRAPH 2020】人像阴影处理，Portrait Shadow Manipulation

专知会员服务

29+阅读 · 2020年5月19日