In an information-seeking conversation, a user converses with an agent to ask a series of questions that can often be under- or over-specified. An ideal agent would first identify that they were in such a situation by searching through their underlying knowledge source and then appropriately interacting with a user to resolve it. However, most existing studies either fail to or artificially incorporate such agent-side initiatives. In this work, we present INSCIT (pronounced Insight), a dataset for information-seeking conversations with mixed-initiative interactions. It contains a total of 4.7K user-agent turns from 805 human-human conversations where the agent searches over Wikipedia and either asks for clarification or provides relevant information to address user queries. We define two subtasks, namely evidence passage identification and response generation, as well as a new human evaluation protocol to assess the model performance. We report results of two strong baselines based on state-of-the-art models of conversational knowledge identification and open-domain question answering. Both models significantly underperform humans and fail to generate coherent and informative responses, suggesting ample room for improvement in future studies.
翻译:在信息查询的谈话中,用户会与代理人交谈,询问一系列往往未充分指定或超标的问题。理想的代理人首先通过搜索其基本知识来源,然后与用户适当互动解决,从而发现他们处于这种情况。然而,大多数现有研究没有或人为地纳入这种代理方倡议。在这项工作中,我们提供了INSCIT(Insight),这是与混合倡议互动进行的信息查询对话的数据集。它包含总共4.7K用户-代理人从805次人与人类的谈话中转过来,其中代理商对维基百科进行了搜索,要求澄清或提供了相关信息,以解决用户询问。我们界定了两个子任务,即证据通过识别和反应生成,以及评估模型性能的新的人类评价程序。我们报告基于对话知识识别和开放性问题回答的最新模式的两个强有力的基线的结果。两种模型都严重落后于人,未能产生一致和丰富的答复,建议未来研究有充足的改进空间。