This paper introduces QAConv, a new question answering (QA) dataset that uses conversations as a knowledge source. We focus on informative conversations including business emails, panel discussions, and work channels. Unlike open-domain and task-oriented dialogues, these conversations are usually long, complex, asynchronous, and involve strong domain knowledge. In total, we collect 34,204 QA pairs, including span-based, free-form, and unanswerable questions, from 10,259 selected conversations with both human-written and machine-generated questions. We segment long conversations into chunks, and use a question generator and dialogue summarizer as auxiliary tools to collect multi-hop questions. The dataset has two testing scenarios, chunk mode and full mode, depending on whether the grounded chunk is provided or retrieved from a large conversational pool. Experimental results show that state-of-the-art QA systems trained on existing QA datasets have limited zero-shot ability and tend to predict our questions as unanswerable. Fine-tuning such systems on our corpus can achieve significant improvement up to 23.6% and 13.6% in both chunk mode and full mode, respectively.
翻译:本文介绍QAConv(QAConv),这是一个使用对话作为知识来源的新答题(QA)数据集。我们侧重于信息性对话,包括商业电子邮件、小组讨论和工作渠道。与开放域和面向任务的对话不同,这些对话通常长、复杂、不同步,涉及强大的域知识。总的来说,我们收集了34 204对QA,包括基于跨域、自由形式和无法回答的问题,从10 259个选定的与人造和机器生成的问题的对话中收集。我们将长段对话分到块中,使用问题生成器和对话摘要作为收集多跳点问题的辅助工具。数据集有两个测试情景,即块模式和完整模式,取决于是否提供或从大型对话库中检索了块块。实验结果表明,在现有QA数据集培训的先进QA系统,其零点能力有限,而且倾向于预测我们的问题无法回答。对我们的系统进行微调,在块和完整模式中,分别实现23.6%和13.6%的重大改进。