用于信息检索的 Adversarial 模拟点击模型 (An Adversarial Imitation Click Model for Information Retrieval)

Modern information retrieval systems, including web search, ads placement, and recommender systems, typically rely on learning from user feedback. Click models, which study how users interact with a ranked list of items, provide a useful understanding of user feedback for learning ranking models. Constructing "right" dependencies is the key of any successful click model. However, probabilistic graphical models (PGMs) have to rely on manually assigned dependencies, and oversimplify user behaviors. Existing neural network based methods promote PGMs by enhancing the expressive ability and allowing flexible dependencies, but still suffer from exposure bias and inferior estimation. In this paper, we propose a novel framework, Adversarial Imitation Click Model (AICM), based on imitation learning. Firstly, we explicitly learn the reward function that recovers users' intrinsic utility and underlying intentions. Secondly, we model user interactions with a ranked list as a dynamic system instead of one-step click prediction, alleviating the exposure bias problem. Finally, we minimize the JS divergence through adversarial training and learn a stable distribution of click sequences, which makes AICM generalize well across different distributions of ranked lists. A theoretical analysis has indicated that AICM reduces the exposure bias from $O(T^2)$ to $O(T)$. Our studies on a public web search dataset show that AICM not only outperforms state-of-the-art models in traditional click metrics but also achieves superior performance in addressing the exposure bias and recovering the underlying patterns of click sequences.

翻译：现代信息检索系统,包括网络搜索、广告放置和建议系统,通常依赖用户反馈的学习。点击模型,这些模型研究用户如何与排名项目列表互动,为学习排名模型提供了对用户反馈的有用理解。构建“ 右”依赖性是任何成功点击模型的关键。然而,概率化图形模型(PGMS)必须依赖人工分配的依赖性,并过度简化用户行为。现有的神经网络基础方法通过提高表达能力,允许灵活的依赖性,促进 PGM,但仍然受到接触偏差和低估计的影响。在本文中,我们提出了一个新的框架,即基于模拟学习的用户反馈反馈模型。首先,我们明确学习奖励功能,以恢复用户的内在效用和基本意图。其次,我们将用户与排名列表作为动态系统进行模拟互动,而不是一步点击预测,减轻暴露偏差问题。最后,我们通过对抗性培训最大限度地减少JSGM的差异, 并学习点击序列的稳定分布,这使得AIC$的上下调标准值模型无法在不同的互联网风险列表中进行精确的搜索。