A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production. Unfortunately, widely used off-policy evaluation methods either make strong assumptions about how users behave that can lead to excessive bias, or they make fewer assumptions and suffer from large variance. We tackle this problem by developing a new estimator that mitigates the problems of the two most popular off-policy estimators for rankings, namely the position-based model and the item-position model. In particular, the new estimator, called INTERPOL, addresses the bias of a potentially misspecified position-based model, while providing an adaptable bias-variance trade-off compared to the item-position model. We provide theoretical arguments as well as empirical results that highlight the performance of our novel estimation approach.
翻译:对工业建议者系统来说,一项关键的需求是有能力在将建议政策部署到生产阶段之前对建议政策进行离线评估。 不幸的是,广泛使用的非政策评价方法要么对用户的行为可能导致过度偏差的行为做出强烈的假设,要么他们作出较少的假设并遭受巨大的差异。 我们通过开发一个新的估算器来解决这一问题,该估算器将缓解两个最受欢迎的非政策排名估算器的问题,即基于职位的模式和项目位置模式。特别是,新的估算器,即国际刑警组织,处理可能错误指定的职位模式的偏向,同时提供与项目位置模式相比的适应性偏差权衡。我们提供了理论论点和经验结果,以突出我们新的估算方法的绩效。