Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. However, accurately estimating the post-click conversion rate (CVR) is challenging due to the selection bias, i.e., the observed clicked events usually happen on users' preferred items. Currently, most existing methods utilize counterfactual learning to debias recommender systems. Among them, the doubly robust (DR) estimator has achieved competitive performance by combining the error imputation based (EIB) estimator and the inverse propensity score (IPS) estimator in a doubly robust way. However, inaccurate error imputation may result in its higher variance than the IPS estimator. Worse still, existing methods typically use simple model-agnostic methods to estimate the imputation error, which are not sufficient to approximate the dynamically changing model-correlated target (i.e., the gradient direction of the prediction model). To solve these problems, we first derive the bias and variance of the DR estimator. Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness. Moreover, we propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation. Besides, we empirically verify that the proposed learning scheme can further eliminate the high variance problem of the imputation learning. To evaluate its effectiveness, extensive experiments are conducted on a semi-synthetic dataset and two real-world datasets. The results demonstrate the superiority of the proposed approach over the state-of-the-art methods. The code is available at https://github.com/guosyjlu/MRDR-DL.
翻译:点击后转换是显示用户偏好的一个强烈信号,对于建立建议系统是有益的。然而,精确估计点击后转换率(CVR)由于选择偏差,即所观察到的点击事件通常发生在用户偏好的项目上,因此具有挑战性。目前,大多数现有方法都使用反事实学习来降低建议系统。其中,二元强(DR)估计器通过合并基于错误估算(EIB)的估测器和半偏差度估测器(IPS),取得了竞争性的性能。但是,准确估计后点击后转换率(CVVR)的偏差率(CVS)由于选择偏差,因此具有挑战性。不准确的误差可能会导致其差异高于IPS的估测器。更糟糕的是,现有方法通常使用简单的模型-认知法来估计浸泡错误,这不足以估计动态变化的模型相关目标(即预测模型的梯度方向),为了解决这些问题,我们首先从DR估测的偏差和偏差性(IP)得出DR估测器的偏差和差异。基于它,一个更稳健的深度的深度的模拟的模型,然后再研研研研研研研判。