The delayed feedback problem is one of the imperative challenges in online advertising, which is caused by the highly diversified feedback delay of a conversion varying from a few minutes to several days. It is hard to design an appropriate online learning system under these non-identical delay for different types of ads and users. In this paper, we propose to tackle the delayed feedback problem in online advertising by "Following the Prophet" (FTP for short). The key insight is that, if the feedback came instantly for all the logged samples, we could get a model without delayed feedback, namely the "prophet". Although the prophet cannot be obtained during online learning, we show that we could predict the prophet's predictions by an aggregation policy on top of a set of multi-task predictions, where each task captures the feedback patterns of different periods. We propose the objective and optimization approach for the policy, and use the logged data to imitate the prophet. Extensive experiments on three real-world advertising datasets show that our method outperforms the previous state-of-the-art baselines.
翻译:延迟的反馈问题是在线广告的紧迫挑战之一,其原因是从几分钟到几天不等的转换过程的高度多样化的反馈延迟。很难在对不同类型的广告和用户的这些非同质的延迟下设计出适当的在线学习系统。在本文中,我们提议通过“追赶先知”来解决在线广告中的延迟反馈问题。关键见解是,如果所有登录样本的反馈立即到来,我们可以得到一个模型而不延迟反馈,即“预言 ” 。 尽管在网上学习期间无法获得先知,但我们表明,我们可以在一套多任务预测的基础上,用一套集成政策来预测先知的预测,其中每项任务都捕捉到不同时期的反馈模式。我们提出了政策的目标和优化方法,并使用日志数据模仿先知。三个真实世界广告数据集的广泛实验显示,我们的方法超过了以前的最先进的基线。