In a conversational question answering scenario, a questioner seeks to extract information about a topic through a series of interdependent questions and answers. As the conversation progresses, they may switch to related topics, a phenomenon commonly observed in information-seeking search sessions. However, current datasets for conversational question answering are limiting in two ways: 1) they do not contain topic switches; and 2) they assume the reference text for the conversation is given, i.e., the setting is not open-domain. We introduce TopiOCQA (pronounced Tapioca), an open-domain conversational dataset with topic switches on Wikipedia. TopiOCQA contains 3,920 conversations with information-seeking questions and free-form answers. TopiOCQA poses a challenging test-bed for models, where efficient retrieval is required on multiple turns of the same conversation, in conjunction with constructing valid responses using conversational history. We evaluate several baselines, by combining state-of-the-art document retrieval methods with neural reader models. Our best models achieves F1 of 51.9, and BLEU score of 42.1 which falls short of human performance by 18.3 points and 17.6 points respectively, indicating the difficulty of our dataset. Our dataset and code will be available at https://mcgill-nlp.github.io/topiocqa
翻译:在一次对话问答中,一个提问者试图通过一系列互相依存的问答来获取关于一个专题的信息。随着对话的进展,他们可能会转换为相关主题,这是一个在信息搜索会中常见的现象。然而,当前用于对话问答的数据集有两种限制:(1) 它们不包含主题开关;(2) 它们假定对话的参考文本是提供的, 即, 设置不是开放的。 我们介绍TopiOCQA( 发布塔皮奥卡 ), 这是在维基百科上使用主题开关的开放多端对话数据集。 TopiOCQA 包含3 920个与信息查询问题和自由形式答案的谈话。 TopiOCQA 为模型提供了一个具有挑战性的测试台, 需要在同一对话的多个回合中高效检索, 并结合使用对话历史构建有效的回复。 我们通过将状态- 艺术文件检索方法与神经阅读模型相结合, 我们的最佳模型达到F1 51.9, 以及42.1比分的BLEU值为42.1, 显示人类性能达18.3点和17.6点的难度。