Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions. We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.
翻译:优化基于用户互动的排名系统是一个研究周全的问题。优化基于用户互动的排名系统的最先进方法被分为在线方法----通过直接与用户互动学习,以及反事实方法----从历史互动中学习;现有的在线方法没有在线干预而受阻,因此不应反倒适用。相反,反事实方法不能直接从在线干预中受益。我们提议为反事实学习和在线学习向排名(LTR)提供具有新颖干预觉悟的测算器。随着干预觉察测器的引入,我们的目标是弥合在线/对抗事实LTR的划分,因为事实证明,这种划分在在线和反事实假设中都非常有效。根据伐木政策和在线干预行为进行校正,纠正定位偏差、信任偏差和项目选择偏差的影响:在收集点击数据时对伐木政策的修改。我们通过半合成实验设置的实验结果显示,与现有的反事实干预方法不同的是,干预能够从网上的巨大收益。