While conversing with chatbots, humans typically tend to ask many questions, a significant portion of which can be answered by referring to large-scale knowledge graphs (KG). While Question Answering (QA) and dialog systems have been studied independently, there is a need to study them closely to evaluate such real-world scenarios faced by bots involving both these tasks. Towards this end, we introduce the task of Complex Sequential QA which combines the two tasks of (i) answering factual questions through complex inferencing over a realistic-sized KG of millions of entities, and (ii) learning to converse through a series of coherently linked QA pairs. Through a labor intensive semi-automatic process, involving in-house and crowdsourced workers, we created a dataset containing around 200K dialogs with a total of 1.6M turns. Further, unlike existing large scale QA datasets which contain simple questions that can be answered from a single tuple, the questions in our dialogs require a larger subgraph of the KG. Specifically, our dataset has questions which require logical, quantitative, and comparative reasoning as well as their combinations. This calls for models which can: (i) parse complex natural language questions, (ii) use conversation context to resolve coreferences and ellipsis in utterances, (iii) ask for clarifications for ambiguous queries, and finally (iv) retrieve relevant subgraphs of the KG to answer such questions. However, our experiments with a combination of state of the art dialog and QA models show that they clearly do not achieve the above objectives and are inadequate for dealing with such complex real world settings. We believe that this new dataset coupled with the limitations of existing models as reported in this paper should encourage further research in Complex Sequential QA.
翻译:人类在与聊天室混在一起时,通常会提出许多问题,其中很大一部分可以通过提及大型知识图表(KG)解答。虽然对问答(QA)和对话系统进行了独立研究,但需要对它们进行深入研究,以评价涉及这两个任务的机器人所面临的真实世界情景。为此,我们引入了复杂的序列QA的任务,将以下两项任务结合起来:(一)通过对一个实际规模的百万实体的KG的复杂解析来回答事实问题,以及(二)通过一系列连贯连接的QA配对的实验来解析问题。虽然对问答(QA)和对话系统进行了独立研究,但通过一个劳动密集型半自动过程,我们创造了一个包含大约200K对话框的数据集,总共翻了1.6M。此外,我们与现有的大型QA数据集不同,它们包含简单的问题,它们应该从一个双轨中解答,我们对话中的问题需要更大规模的KG的子图解。 具体地说,我们的数据设置有更清晰的解析 QQ(ii) 这样的问题需要逻辑性、定量和比较性研究,当然的解析,当然的解问 。(我们现在的解到更清楚的里程的答案,我们的数据和直径的解的答案可以理解) 。(我们这样的解到更清楚的逻辑、定量和直交的答案可以理解) 。