购物查询数据集:改进产品搜索的大型ESCI基准 (Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search)

Improving the quality of search results can significantly enhance users experience and engagement with search engines. In spite of several recent advancements in the fields of machine learning and data mining, correctly classifying items for a particular user search query has been a long-standing challenge, which still has a large room for improvement. This paper introduces the "Shopping Queries Dataset", a large dataset of difficult Amazon search queries and results, publicly released with the aim of fostering research in improving the quality of search results. The dataset contains around 130 thousand unique queries and 2.6 million manually labeled (query,product) relevance judgements. The dataset is multilingual with queries in English, Japanese, and Spanish. The Shopping Queries Dataset is being used in one of the KDDCup'22 challenges. In this paper, we describe the dataset and present three evaluation tasks along with baseline results: (i) ranking the results list, (ii) classifying product results into relevance categories, and (iii) identifying substitute products for a given query. We anticipate that this data will become the gold standard for future research in the topic of product search.

翻译：提高搜索结果的质量可以大大提高用户的经验和与搜索引擎的接触。尽管最近在机器学习和数据挖掘领域取得了一些进展,但正确分类特定用户搜索查询的项目是一项长期挑战,仍然有很大的改进空间。本文介绍“购物查询数据集”,这是一套庞大的关于亚马逊地区困难搜索查询和结果的数据集,公开发布,目的是促进提高搜索结果质量的研究。数据集包含大约13万个独特的查询和260万个人工标签(query, product)相关判断。数据集有英文、日文和西班牙文的查询。“购物查询数据集”在KDDCup'22的一项挑战中正在使用。在本文件中,我们描述了数据集,并提出了三项评价任务以及基线结果:(一) 排列结果清单,(二) 将产品结果分类为相关分类,(三) 确定特定查询的替代产品。我们预计,这一数据将成为今后产品搜索主题研究的黄金标准。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日