Clicks on rankings suffer from position bias: generally items on lower ranks are less likely to be examined - and thus clicked - by users, in spite of their actual preferences between items. The prevalent approach to unbiased click-based learning-to-rank (LTR) is based on counterfactual inverse-propensity-scoring (IPS) estimation. In contrast with general reinforcement learning, counterfactual doubly-robust (DR) estimation has not been applied to click-based LTR in previous literature. In this paper, we introduce a novel DR estimator that is the first DR approach specifically designed for position-bias. The difficulty with position bias is that the treatment - user examination - is not directly observable in click data. As a solution, our estimator uses the expected treatment per rank, instead of the actual treatment that existing DR estimators use. Our novel DR estimator has more robust unbiasedness conditions than the existing IPS approach, and in addition, provides enormous decreases in variance: our experimental results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance. For the unbiased LTR field, our DR estimator contributes both increases in state-of-the-art performance and the most robust theoretical guarantees of all known LTR estimators.
翻译:点击排名时会受到位置偏差的偏差: 普通的低级别项目不太可能被用户审查, 因而也不太可能被用户点击, 尽管他们实际偏好于不同项目。 不带偏见的点击学习到排行( LTR) 的普遍方法基于反事实反反反向偏向分校( IPS) 估计 。 与一般强化学习相比, 我们新的DR估计没有被应用到基于点击的 LTR 上。 在本文中, 我们引入了一个新的DR 估计器, 这是第一个专门为位置偏向而设计的DR 方法。 定位偏向的难度在于在单击数据时不能直接观察对待- 用户检查 。 作为一种解决办法, 我们的估算器使用每级预期的待遇, 而不是现有 DR 估计器使用的实际待遇 。 我们的新DR 估计器比现有的 IPS 方法更牢固的公正性条件, 并且提供了巨大的差异: 我们的实验结果显示它需要几个数量级级的DR级数据点, 也就是我们所知道的最高级的理论性表现。