Click logs are valuable resources for a variety of information retrieval (IR) tasks. This includes query understanding/analysis, as well as learning effective IR models particularly when the models require large amounts of training data. We release a large-scale domain-specific dataset of click logs, obtained from user interactions of the Trip Database health web search engine. Our click log dataset comprises approximately 5.2 million user interactions collected between 2013 and 2020. We use this dataset to create a standard IR evaluation benchmark -- TripClick -- with around 700,000 unique free-text queries and 1.3 million pairs of query-document relevance signals, whose relevance is estimated by two click-through models. As such, the collection is one of the few datasets offering the necessary data richness and scale to train neural IR models with a large amount of parameters, and notably the first in the health domain. Using TripClick, we conduct experiments to evaluate a variety of IR models, showing the benefits of exploiting this data to train neural architectures. In particular, the evaluation results show that the best performing neural IR model significantly improves the performance by a large margin relative to classical IR models, especially for more frequent queries.
翻译:点击日志是各种信息检索(IR)任务的宝贵资源。 包括查询理解/ 分析, 以及学习有效的IR模型, 特别是当模型需要大量培训数据时。 我们发布大量来自Trip数据库健康网络搜索引擎用户互动的点击日志数据集。 我们的点击日志数据集包括2013年至2020年期间收集的大约520万用户互动。 我们使用该数据集来创建标准的IR评估基准 -- -- TripClick -- -- 约70万个独特的自由文本查询和130万对查询文件相关信号, 其相关性由两个点击式模型估算。 因此, 收集是为数不多的数据集之一, 提供了大量参数对神经IR模型进行必要数据丰富和规模的培训, 特别是卫生领域的第一个。 使用TripClick, 我们进行实验来评估各种IR模型, 展示利用这些数据来培训神经结构的好处。 特别是, 评估结果显示, 最佳的神经模型模型通过两个点击模式大大改进了与古典IR模型相对的大边缘的性能, 特别是频繁的查询。