Conversational information seeking (CIS) is playing an increasingly important role in connecting people to information. Due to the lack of suitable resource, previous studies on CIS are limited to the study of theoretical/conceptual frameworks, laboratory-based user studies, or a particular aspect of CIS (e.g., asking clarifying questions). In this work, we make efforts to facilitate research on CIS from three aspects. (1) We formulate a pipeline for CIS with six sub-tasks: intent detection (ID), keyphrase extraction (KE), action prediction (AP), query selection (QS), passage selection (PS), and response generation (RG). (2) We release a benchmark dataset, called wizard of search engine (WISE), which allows for comprehensive and in-depth research on all aspects of CIS. (3) We design a neural architecture capable of training and evaluating both jointly and separately on the six sub-tasks, and devise a pre-train/fine-tune learning scheme, that can reduce the requirements of WISE in scale by making full use of available data. We report some useful characteristics of CIS based on statistics of WISE. We also show that our best performing model variant isable to achieve effective CIS as indicated by several metrics. We release the dataset, the code, as well as the evaluation scripts to facilitate future research by measuring further improvements in this important research direction.
翻译:由于缺乏适当资源,先前关于独联体的研究仅限于研究理论/概念框架、实验室用户研究或独联体的一个特定方面(例如,提出澄清问题);在这项工作中,我们努力促进从三个方面对独联体的研究。 (1) 我们为独联体制定一条管道,有六个分任务:意图探测(ID)、关键词提取(KE)、行动预测(AP)、查询选择(QS)、通过选择(PS)和反应生成(RG)。 (2) 我们发布一套基准数据集,称为搜索引擎巫师(WISE),允许对独联体所有方面进行全面和深入的研究。 (3) 我们设计一个神经结构,能够联合和分别对六个分任务进行培训和评价,并设计一个前技术/知识学习计划,通过充分利用现有数据,可以减少WISE对规模的要求。 我们还根据WISE的统计,报告独联体一些有用的特征,称为搜索引擎巫师(WISE),这使我们得以对独联体的所有方面进行全面和深入的研究。