Conversational search has seen increased recent attention in both the IR and NLP communities. It seeks to clarify and solve a user's search need through multi-turn natural language interactions. However, most existing systems are trained and demonstrated with recorded or artificial conversation logs. Eventually, conversational search systems should be trained, evaluated, and deployed in an open-ended setting with unseen conversation trajectories. A key challenge is that training and evaluating such systems both require a human-in-the-loop, which is expensive and does not scale. One strategy for this is to simulate users, thereby reducing the scaling costs. However, current user simulators are either limited to only respond to yes-no questions from the conversational search system, or unable to produce high quality responses in general. In this paper, we show that current state-of-the-art user simulation system could be significantly improved by replacing it with a smaller but advanced natural language generation model. But rather than merely reporting this new state-of-the-art, we present an in-depth investigation of the task of simulating user response for conversational search. Our goal is to supplement existing works with an insightful hand-analysis of what challenges are still unsolved by the advanced model, as well as to propose our solutions for them. The challenges we identified include (1) dataset noise, (2) a blind spot that is difficult for existing models to learn, and (3) a specific type of misevaluation in the standard empirical setup. Except for the dataset noise issue, we propose solutions to cover the training blind spot and to avoid the misevaluation. Our proposed solutions lead to further improvements. Our best system improves the previous state-of-the-art significantly.
翻译:对话式搜索近年来在IR和NLP社区中受到越来越多的关注。它通过多轮自然语言交互来澄清和解决用户的搜索需求。然而,大多数现有系统都是使用记录的或人工生成的对话日志进行训练和演示。最终,对话式搜索系统应在未见过的对话轨迹的开放环境中进行训练、评估和部署。一个关键的挑战是,训练和评估这样的系统都需要人为参与,这是昂贵且不可扩展的。其中一个策略是模拟用户,从而降低扩展成本。然而,目前的用户模拟器要么只能回答对话式搜索系统的是或否问题,要么无法产生高质量的响应。在本文中,我们展示了目前最先进的用户模拟系统通过替换为一个更小但先进的自然语言生成模型可以显著改进。但我们不仅仅报告了这个新的最先进技术,而是对对话式搜索中模拟用户响应的任务进行了深入的研究。我们的目标是通过深入的手工分析来补充现有的工作,找出先进模型尚未解决的挑战,以及为它们提出我们的解决方案。我们确定的挑战包括:(1)数据集噪声,(2)一个现有模型难以学习的盲点,(3)一个特定类型的标准实证设置的 misevaluation。除了数据集噪声问题外,我们提出了解决训练盲点和避免 misevaluation 方面的解决方案。我们提出的解决方案带来了进一步的改进。我们的最佳系统显著提高了先前的最新技术水平。