Clarifying the underlying user information need by asking clarifying questions is an important feature of modern conversational search system. However, evaluation of such systems through answering prompted clarifying questions requires significant human effort, which can be time-consuming and expensive. In this paper, we propose a conversational User Simulator, called USi, for automatic evaluation of such conversational search systems. Given a description of an information need, USi is capable of automatically answering clarifying questions about the topic throughout the search session. Through a set of experiments, including automated natural language generation metrics and crowdsourcing studies, we show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers. Moreover, we make the first steps towards multi-turn interactions, where conversational search systems asks multiple questions to the (simulated) user with a goal of clarifying the user need. To this end, we expand on currently available datasets for studying clarifying questions, i.e., Qulac and ClariQ, by performing a crowdsourcing-based multi-turn data acquisition. We show that our generative, GPT2-based model, is capable of providing accurate and natural answers to unseen clarifying questions in the single-turn setting and discuss capabilities of our model in the multi-turn setting. We provide the code, data, and the pre-trained model to be used for further research on the topic.
翻译:澄清用户的基本信息需求,提出澄清问题,是现代对话搜索系统的一个重要特征。然而,通过回答被澄清的问题对此类系统进行评估,需要大量人力投入,这可能耗费时间和费用。在本文中,我们提议使用一个称为USi的对口用户模拟器,对此类对口搜索系统进行自动评估。鉴于对信息需求的描述,USi能够在整个搜索过程中自动回答关于该主题的澄清问题。通过一系列实验,包括自动化的自然语言生成度量度和众包研究,我们显示USi提供的答复既符合基本信息需求,也与人类生成的答案相匹配。此外,我们采取的第一个步骤是多方向互动,即对话搜索系统向(模拟)用户提出多个问题,目的是澄清用户的需求。为此,我们扩大现有的数据集,以研究澄清问题,即库拉茨和克拉里克,进行基于众包的多方向数据采集。我们展示了我们的基因化、GPT2模型模型和人类生成的答案。此外,我们在多方向互动方面迈出了第一步,以便提供精确和多方向的模型,我们用来分析的单一方向研究。