Clicks on rankings suffer from position-bias: generally items on lower ranks are less likely to be examined - and thus clicked - by users, in spite of their actual preferences between items. The prevalent approach to unbiased click-based learning-to-rank (LTR) is based on counterfactual inverse-propensity-scoring (IPS) estimation. In contrast with general reinforcement learning, counterfactual doubly-robust (DR) estimation has not been applied to click-based LTR in previous literature. In this paper, we introduce a novel DR estimator that is the first DR approach specifically designed for position-bias. The difficulty with position-bias is that the treatment - user examination - is not directly observable in click data. As a solution, our estimator uses the expected treatment per rank, instead of the actual treatment that existing DR estimators use. Our novel DR estimator has more robust unbiasedness conditions than the existing IPS approach, and in addition, provides enormous decreases in variance: our experimental results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance. For the unbiased LTR field, our DR estimator contributes both increases in state-of-the-art performance and the most robust theoretical guarantees of all known LTR estimators.
翻译:点击排名时会受到位置偏差的影响: 一般来说, 低级别上的项目不太可能受到用户的检查, 因而也不太可能受到用户的点击, 尽管他们实际偏好于不同项目。 不带偏见的点击式学习到排行( LTR) 的普遍方法是基于反事实反反向偏向分校( IPS) 的估算。 与一般强化学习相比, 反现实的二重脉冲( DR) 估计没有应用到以往文献中基于点击的 LTR 。 本文中, 我们引入了一个新的 DR 估计值, 这是专门为位置- 偏向而设计的首个 DR 估计值方法。 位置偏向偏向偏向的处理( 用户考量) 的普遍方法是在点击数据中无法直接观察到。 作为一种解决办法, 我们的估测员使用每级的预期待遇, 而不是现有的DRS 估计值的实际处理方法。 我们的新DR 估计值比现有的 IPS 方法更稳健的不偏差性, 此外, 提供了巨大的差异: 我们的实验结果显示它需要几级级级级级的排序, 最低的 的 TR 和最均匀的字段 。