通过过时抽样调查获得转换率预测的延迟反馈 (Capturing Delayed Feedback in Conversion Rate Prediction via Elapsed-Time Sampling)

Conversion rate (CVR) prediction is one of the most critical tasks for digital display advertising. Commercial systems often require to update models in an online learning manner to catch up with the evolving data distribution. However, conversions usually do not happen immediately after a user click. This may result in inaccurate labeling, which is called delayed feedback problem. In previous studies, delayed feedback problem is handled either by waiting positive label for a long period of time, or by consuming the negative sample on its arrival and then insert a positive duplicate when a conversion happens later. Indeed, there is a trade-off between waiting for more accurate labels and utilizing fresh data, which is not considered in existing works. To strike a balance in this trade-off, we propose Elapsed-Time Sampling Delayed Feedback Model (ES-DFM), which models the relationship between the observed conversion distribution and the true conversion distribution. Then we optimize the expectation of true conversion distribution via importance sampling under the elapsed-time sampling distribution. We further estimate the importance weight for each instance, which is used as the weight of loss function in CVR prediction. To demonstrate the effectiveness of ES-DFM, we conduct extensive experiments on a public data and a private industrial dataset. Experimental results confirm that our method consistently outperforms the previous state-of-the-art results.

翻译：转换率( CVR) 预测是数字显示广告的最关键任务之一。商业系统通常要求以在线学习方式更新模型,以跟上数据分布的变化。但是,转换通常不会在用户点击后立即发生。这可能导致标签不准确, 即所谓的延迟反馈问题。在以往的研究中, 延迟反馈问题要么通过长时间等待正标签, 要么在抵达时消化负抽样, 然后在转换后再插入正重复。事实上, 在等待更准确的标签和利用现有工作中未考虑的更新数据之间, 存在着一种权衡。为了平衡这一权衡, 我们建议了Erde- Time抽样延迟反馈模型(ES- DFM), 以所观察到的转换分布和真实的转换分布之间的关系为模型。然后, 我们通过在时间过长的抽样分布中进行重要取样, 来优化真实转换分布的预期。我们进一步估计每个实例的重要性, 用于CVR 预测损失的权重。为了显示ES- DFM 的效益, 我们进行了广泛的实验, 我们对以往的公共数据进行了广泛的实验。