Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias when observation propensities are known, it remains to show how to effectively estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. First, we show how to harvest a specific type of intervention data from historic feedback logs of multiple different ranking functions, and show that this data is sufficient for consistent propensity estimation in the position-based model. Second, we propose a new extremum estimator that makes effective use of this data. In an empirical evaluation, we find that the new estimator provides superior propensity estimates in two real-world systems -- Arxiv Full-text Search and Google Drive Search. Beyond these two points, we find that the method is robust to a wide range of settings in simulation studies.
翻译:在学习搜索引擎的隐含反馈时,演示偏向是关键挑战之一,因为它混淆了相关性信号。虽然最近展示了反事实学习到排序方法(LTR)如何在观察倾向为人所知时可以克服演示偏向,但仍要展示如何有效估计这些倾向。在本文中,我们提出了第一个方法,在没有人工相关判断、破坏性干预或限制性相关模型假设的情况下得出一致的偏向性估计。首先,我们展示了如何从多个不同级别功能的历史反馈日志中获取特定类型的干预数据,并显示这些数据足以在基于位置的模式中进行一致的偏向性估计。第二,我们提议一个新的极端估计器,以有效利用这些数据。在经验评估中,我们发现新的估计器提供了两种真实世界系统中的高级偏向性估计 -- -- Arxiv 全文搜索和谷歌驱动搜索。除了这两个点外,我们发现该方法对模拟研究中的广泛环境是可靠的。