A well-known problem when learning from user clicks are inherent biases prevalent in the data, such as position or trust bias. Click models are a common method for extracting information from user clicks, such as document relevance in web search, or to estimate click biases for downstream applications such as counterfactual learning-to-rank, ad placement, or fair ranking. Recent work shows that the current evaluation practices in the community fail to guarantee that a well-performing click model generalizes well to downstream tasks in which the ranking distribution differs from the training distribution, i.e., under covariate shift. In this work, we propose an evaluation metric based on conditional independence testing to detect a lack of robustness to covariate shift in click models. We introduce the concept of debiasedness and a metric for measuring it. We prove that debiasedness is a necessary condition for recovering unbiased and consistent relevance scores and for the invariance of click prediction under covariate shift. In extensive semi-synthetic experiments, we show that our proposed metric helps to predict the downstream performance of click models under covariate shift and is useful in an off-policy model selection setting.
翻译:学习用户点击时普遍存在固有偏差,比如位置或信任偏差等问题,这是一个公认的问题。click模型是从用户点击中提取信息的常见方法,例如网页搜索中的文档相关性或者评估点击偏差用于下游任务,例如因果反事实学习排序算法、广告投放或公平排序。最近的研究显示,社区中的现有评估实践不能保证良好性能的click模型在排名分布与训练分布不同即协变量转移时有良好的推广性。在这项工作中,我们提出了一种基于条件独立性检验的评估指标,以检测click模型在协变量转移方面缺乏鲁棒性。我们引入了“非偏差度”概念和度量方法。我们证明了非偏差度是恢复无偏和一致相关性分数的必要条件,且在协变量转移下点击预测的不变性得以保持。在大量半合成实验中,我们展示了我们提出的指标有助于预测click模型在协变量转移下的下游表现,并且在离线模型选择设置中非常有用。