This paper introduces QAConv, a new question answering (QA) dataset that uses conversations as a knowledge source. We focus on informative conversations, including business emails, panel discussions, and work channels. Unlike open-domain and task-oriented dialogues, these conversations are usually long, complex, asynchronous, and involve strong domain knowledge. In total, we collect 34,608 QA pairs from 10,259 selected conversations with both human-written and machine-generated questions. We use a question generator and a dialogue summarizer as auxiliary tools to collect and recommend questions. The dataset has two testing scenarios: chunk mode and full mode, depending on whether the grounded partial conversation is provided or retrieved. Experimental results show that state-of-the-art pretrained QA systems have limited zero-shot performance and tend to predict our questions as unanswerable. Our dataset provides a new training and evaluation testbed to facilitate QA on conversations research.
翻译:本文介绍了QAConv(QAConv),这是一个使用对话作为知识来源的新答题数据集。我们侧重于内容丰富的对话,包括商业电子邮件、小组讨论和工作渠道。与开放领域和任务导向的对话不同,这些对话通常长、复杂、不同步,涉及强大的域知识。我们共收集了10 259个与人造和机器生成的问题的选定对话中的34 608对QA。我们用一个问题生成器和一个对话摘要器作为收集和建议问题的辅助工具。数据集有两个测试场景:块模式和完整模式,取决于是否提供或检索了有根的部分对话。实验结果显示,最先进的QA系统限制零光性能,并倾向于预测我们的问题是无法解答的。我们的数据集提供了一个新的培训和评价测试台,以便利对话研究的质量控制。