Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.
翻译:使用包含互动偏差的登录用户互动来优化排名系统。 现有方法只有在用户在每排名中提供所有相关项目时是公正的。 目前不存在顶级排名的反事实公正 LTR 方法。 我们为 LTR 指标引入了一个新的政策认知反事实估计器, 这可以说明Stochistic 伐木政策的效果。 我们证明,如果每个相关项目在最高排名中出现非零概率, 政策认知估计器是公正的。 我们的实验结果表明,我们的估计器的性能不受 k 大小的影响: 对于任何 k, 政策认知估计器达到相同的检索性能, 同时学习从全排名反馈中学习的顶级反馈。 最后, 我们引入传统LTR 方法的新扩展, 以进行反现实 LTR 和优化顶级测量。 我们的贡献加在一起, 引入了第一个政策认知不偏倚的LTR 方法, 从顶级反馈中学习, 优化现在的顶级和顶级矩阵搜索结果。