Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs such as the position of a document. A successful factorization will allow the relevance tower to be exempt from biases. In this work, we identify a critical issue that existing ULTR methods ignored - the bias tower can be confounded with the relevance tower via the underlying true relevance. In particular, the positions were determined by the logging policy, i.e., the previous production model, which would possess relevance information. We give both theoretical analysis and empirical results to show the negative effects on relevance tower due to such a correlation. We then propose three methods to mitigate the negative confounding effects by better disentangling relevance and bias. Empirical results on both controlled public datasets and a large-scale industry dataset show the effectiveness of the proposed approaches.
翻译:无偏见地学习排名(LUTR)研究从隐性用户反馈数据(如点击)中减少各种偏见的问题,最近受到相当重视。流行的LUTR对现实世界应用采用双塔结构,将点击模型纳入具有定期输入特点的相关塔,以及带有偏见性投入(如文件的位置)的偏向塔。成功的乘数将使相关塔免受偏见的影响。在这项工作中,我们发现一个关键问题,即现有的LUTR方法被忽视了,偏向塔可以通过潜在的真正相关性与相关塔混为一谈。特别是,立场是由伐木政策(即以前的生产模型)确定的,该模型将拥有相关信息。我们进行理论分析和实验结果,以表明由于这种关联性而对相关塔产生的消极影响。我们然后提出三种方法,通过更好地断开关联性和偏见来减轻负面的纠结效应。关于受控公共数据集和大型工业数据集的预测结果显示拟议方法的有效性。