Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.
翻译:将verbose信息需求转换成直截了当的搜索查询是一个普遍现象,但几乎无法理解的现象。对这一过程的观察在若干应用中可能很有价值,包括综合学术研究界可以随时获得的公共网络来源的大型隐私友好查询日志。在这项工作中,我们通过利用社区问题解答(CQA)论坛的丰富潜力,朝着了解查询的提法迈出了一步。具体地说,我们抽样研究来自Stack Exchange平台的不同主题的自然语言问题,并进行大规模转换实验,让人群工人在寻找同等信息时提出搜索查询。我们仔细分析这些数据,说明转换过程中可能的偏差来源,并深入了解用户特有的语言模式和搜索行为。我们从这项研究中发布7 000对问题解答配对数据集,以便利对查询理解的进一步研究。