A reader interested in a particular topic might be interested in summarizing documents on that subject with a particular focus, rather than simply seeing generic summaries produced by most summarization systems. While query-focused summarization has been explored in prior work, this is often approached from the standpoint of document-specific questions or on synthetic data. Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries. In this paper, we collect a dataset of realistic aspect-oriented test cases, AspectNews, which covers different subtopics about articles in news sub-domains. We then investigate how query-focused methods, for which we can construct synthetic data, can handle this aspect-oriented setting: we benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model. We evaluate on two aspect-oriented datasets and find this approach yields (a) focused summaries, better than those from a generic summarization system, which go beyond simple keyword matching; (b) a system sensitive to the choice of keywords.
翻译:对某一专题感兴趣的读者可能有兴趣以特定重点来总结关于这一主题的文件,而不是仅仅看到多数总结系统所编写的一般摘要。虽然以前的工作已经探讨了以询问为重点的总结,但通常从文件特定问题的角度或合成数据的角度来探讨这个问题。真正的用户的需要往往更紧密地涉及各个方面,在一个数据集中,用户感兴趣的是广泛的主题,而不是具体的查询。在本文件中,我们收集了一套现实的、面向方方面面的测试案例的数据集,即AspectNews,它涵盖了关于新闻分域文章的不同分专题。我们接着调查了以询问为重点的方法(我们可以为此建立合成数据)如何能够处理这种面向方方面面的设置:我们为以采掘为重点的培训计划制定基准,并提出一种对比性强化方法来培训模型。我们评估了两个面向方方面面的数据集,发现这种方法产生:(a) 重点摘要,比一般概括系统的摘要更好,它们超出了简单的关键词匹配;(b) 系统对关键词的选择敏感。