In this work, we present a novel way of computing IPS using a position-bias model for deterministic logging policies. This technique significantly widens the policies on which OPE can be used. We validate this technique using two different experiments on industry-scale data. The OPE results are clearly strongly correlated with the online results, with some constant bias. The estimator requires the examination model to be a reasonably accurate approximation of real user behaviour.
翻译:在这项工作中,我们提出了一种使用确定性伐木政策的位置偏差模型计算IPS的新方式。这种技术极大地扩大了OPE可以使用的政策范围。我们用两种不同的工业规模数据实验来验证这种技术。OPE的结果显然与在线结果密切相关,并有一些持续的偏差。估计数字要求测试模型对实际用户行为进行合理的准确近似。