Community-based Question Answering (CQA), which allows users to acquire their desired information, has increasingly become an essential component of online services in various domains such as E-commerce, travel, and dining. However, an overwhelming number of CQA pairs makes it difficult for users without particular intent to find useful information spread over CQA pairs. To help users quickly digest the key information, we propose the novel CQA summarization task that aims to create a concise summary from CQA pairs. To this end, we first design a multi-stage data annotation process and create a benchmark dataset, CoQASUM, based on the Amazon QA corpus. We then compare a collection of extractive and abstractive summarization methods and establish a strong baseline approach DedupLED for the CQA summarization task. Our experiment further confirms two key challenges, sentence-type transfer and deduplication removal, towards the CQA summarization task. Our data and code are publicly available.
翻译:社区问题解答(CQA)使用户能够获取他们想要的信息,它日益成为电子商务、旅行和餐饮等各个领域在线服务的重要组成部分,然而,绝大多数社区问题解答(CQA)对口使用户难以找到在社区问题解答(CQA)对口之间传播的有用信息。为了帮助用户迅速消化关键信息,我们提议新的社区问题解答(CQA)总结任务,目的是从社区问题解答(CQA)对口中生成简明摘要。为此,我们首先设计一个多阶段数据解析过程,并建立一个基准数据集(COQASUM),以亚马逊质量简库为基础。然后,我们比较了采掘和抽象的总结方法,并为CQA的总结任务制定了强有力的基线方法。我们的实验进一步确认了两个关键挑战,即判决型转移和分离,即CQA的总结任务。我们的数据和代码是公开的。