Shopping online is more and more frequent in our everyday life. For e-commerce search systems, understanding natural language coming through voice assistants, chatbots or from conversational search is an essential ability to understand what the user really wants. However, evaluation datasets with natural and detailed information needs of product-seekers which could be used for research do not exist. Due to privacy issues and competitive consequences, only few datasets with real user search queries from logs are openly available. In this paper, we present a dataset of 3,540 natural language queries in two domains that describe what users want when searching for a laptop or a jacket of their choice. The dataset contains annotations of vague terms and key facts of 1,754 laptop queries. This dataset opens up a range of research opportunities in the fields of natural language processing and (interactive) information retrieval for product search.
翻译:网上购物在我们日常生活中越来越频繁。对于电子商务搜索系统来说,通过语音助理、聊天机或对话搜索来理解自然语言是了解用户真正想要的东西的基本能力。然而,没有可用于研究的产品搜索者的自然和详细信息需要的评价数据集。由于隐私问题和竞争后果,只有很少的数据集能够公开提供日志上的真正用户搜索查询。在本文中,我们展示了两个领域的3 540个自然语言查询数据集,其中描述了用户在搜索膝上型计算机或他们选择的夹克时需要什么。数据集载有1 754个膝上型计算机查询的模糊术语和关键事实说明。这一数据集打开了自然语言处理和产品搜索(互动)信息检索领域的一系列研究机会。