Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. However, accurately estimating the post-click conversion rate (CVR) is challenging due to the selection bias, i.e., the observed clicked events usually happen on users' preferred items. Currently, most existing methods utilize counterfactual learning to debias recommender systems. Among them, the doubly robust (DR) estimator has achieved competitive performance by combining the error imputation based (EIB) estimator and the inverse propensity score (IPS) estimator in a doubly robust way. However, inaccurate error imputation may result in its higher variance than the IPS estimator. Worse still, existing methods typically use simple model-agnostic methods to estimate the imputation error, which are not sufficient to approximate the dynamically changing model-correlated target (i.e., the gradient direction of the prediction model). To solve these problems, we first derive the bias and variance of the DR estimator. Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness. Moreover, we propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation. Besides, we empirically verify that the proposed learning scheme can further eliminate the high variance problem of the imputation learning. To evaluate its effectiveness, extensive experiments are conducted on a semi-synthetic dataset and two real-world datasets. The results demonstrate the superiority of the proposed approach over the state-of-the-art methods. The code is available at https://github.com/guosyjlu/MRDR-DL.


翻译:点击后转换是显示用户偏好的一个强烈信号,对于建立建议系统是有益的。然而,精确估计点击后转换率(CVR)由于选择偏差,即所观察到的点击事件通常发生在用户偏好的项目上,因此具有挑战性。目前,大多数现有方法都使用反事实学习来降低建议系统。其中,二元强(DR)估计器通过合并基于错误估算(EIB)的估测器和半偏差度估测器(IPS),取得了竞争性的性能。但是,准确估计后点击后转换率(CVVR)的偏差率(CVS)由于选择偏差,因此具有挑战性。不准确的误差可能会导致其差异高于IPS的估测器。更糟糕的是,现有方法通常使用简单的模型-认知法来估计浸泡错误,这不足以估计动态变化的模型相关目标(即预测模型的梯度方向),为了解决这些问题,我们首先从DR估测的偏差和偏差性(IP)得出DR估测器的偏差和差异。基于它,一个更稳健的深度的深度的模拟的模型,然后再研研研研研研研判。

0
下载
关闭预览

相关内容

Fariz Darari简明《博弈论Game Theory》介绍,35页ppt
专知会员服务
109+阅读 · 2020年5月15日
Keras François Chollet 《Deep Learning with Python 》, 386页pdf
专知会员服务
151+阅读 · 2019年10月12日
Hierarchically Structured Meta-learning
CreateAMind
26+阅读 · 2019年5月22日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
已删除
将门创投
5+阅读 · 2017年8月15日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
Arxiv
0+阅读 · 2021年7月16日
Arxiv
5+阅读 · 2021年4月21日
VIP会员
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
26+阅读 · 2019年5月22日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
已删除
将门创投
5+阅读 · 2017年8月15日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
Top
微信扫码咨询专知VIP会员