TopiOCQA: 以主题切换回答的开放式对话问题 (TopiOCQA: Open-domain Conversational Question Answeringwith Topic Switching)

In a conversational question answering scenario, a questioner seeks to extract information about a topic through a series of interdependent questions and answers. As the conversation progresses, they may switch to related topics, a phenomenon commonly observed in information-seeking search sessions. However, current datasets for conversational question answering are limiting in two ways: 1) they do not contain topic switches; and 2) they assume the reference text for the conversation is given, i.e., the setting is not open-domain. We introduce TopiOCQA (pronounced Tapioca), an open-domain conversational dataset with topic switches on Wikipedia. TopiOCQA contains 3,920 conversations with information-seeking questions and free-form answers. TopiOCQA poses a challenging test-bed for models, where efficient retrieval is required on multiple turns of the same conversation, in conjunction with constructing valid responses using conversational history. We evaluate several baselines, by combining state-of-the-art document retrieval methods with neural reader models. Our best models achieves F1 of 51.9, and BLEU score of 42.1 which falls short of human performance by 18.3 points and 17.6 points respectively, indicating the difficulty of our dataset. Our dataset and code will be available at https://mcgill-nlp.github.io/topiocqa

翻译：在一次对话问答中,一个提问者试图通过一系列互相依存的问答来获取关于一个专题的信息。随着对话的进展,他们可能会转换为相关主题,这是一个在信息搜索会中常见的现象。然而,当前用于对话问答的数据集有两种限制:(1) 它们不包含主题开关;(2) 它们假定对话的参考文本是提供的, 即, 设置不是开放的。我们介绍TopiOCQA( 发布塔皮奥卡 ), 这是在维基百科上使用主题开关的开放多端对话数据集。 TopiOCQA 包含3 920个与信息查询问题和自由形式答案的谈话。 TopiOCQA 为模型提供了一个具有挑战性的测试台, 需要在同一对话的多个回合中高效检索, 并结合使用对话历史构建有效的回复。我们通过将状态- 艺术文件检索方法与神经阅读模型相结合, 我们的最佳模型达到F1 51.9, 以及42.1比分的BLEU值为42.1, 显示人类性能达18.3点和17.6点的难度。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

专知会员服务

67+阅读 · 2021年11月15日