与强化学习有关的搜索点击行为脱钩模型 (De-Biased Modelling of Search Click Behavior with Reinforcement Learning)

Users' clicks on Web search results are one of the key signals for evaluating and improving web search quality and have been widely used as part of current state-of-the-art Learning-To-Rank(LTR) models. With a large volume of search logs available for major search engines, effective models of searcher click behavior have emerged to evaluate and train LTR models. However, when modeling the users' click behavior, considering the bias of the behavior is imperative. In particular, when a search result is not clicked, it is not necessarily chosen as not relevant by the user, but instead could have been simply missed, especially for lower-ranked results. These kinds of biases in the click log data can be incorporated into the click models, propagating the errors to the resulting LTR ranking models or evaluation metrics. In this paper, we propose the De-biased Reinforcement Learning Click model (DRLC). The DRLC model relaxes previously made assumptions about the users' examination behavior and resulting latent states. To implement the DRLC model, convolutional neural networks are used as the value networks for reinforcement learning, trained to learn a policy to reduce bias in the click logs. To demonstrate the effectiveness of the DRLC model, we first compare performance with the previous state-of-art approaches using established click prediction metrics, including log-likelihood and perplexity. We further show that DRLC also leads to improvements in ranking performance. Our experiments demonstrate the effectiveness of the DRLC model in learning to reduce bias in click logs, leading to improved modeling performance and showing the potential for using DRLC for improving Web search quality.

翻译：在网络搜索结果上点击用户是评估和改进网络搜索质量的关键信号之一,并且被广泛用作当前最新水平的“学习到兰克”(LTR)模型的一部分。在主要搜索引擎有大量搜索日志的情况下,产生了有效的搜索者点击行为模型来评估和训练LTR模型。然而,在模拟用户点击行为时,考虑到行为偏差,有必要考虑行为偏差。特别是,当搜索结果不被点击时,它不一定被用户选择为不相关,而是被简单地忽略了,特别是对于排名较低的结果。点击日志数据中的这些偏差可以纳入点击模型,将错误推广到由此产生的 LTR 排名模型或评价指标中。在本文中,我们提议了不偏差的加强学习学习学习模式(DRLC)。DLLC模型以前对用户的检查行为和产生潜伏状态作了一些假设。为了实施DRLC模型,在使用实验室模型模型模型模型搜索网络时,将移动神经网络用作强化的搜索网络,在使用 RDR 测试中,在测试中,我们用已经培训的测试了一个状态的校正的校正,以显示我们先前的校正。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

175+阅读 · 2020年5月10日

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日