We consider training a binary classifier under delayed feedback (\emph{DF learning}). For example, in the conversion prediction in online ads, we initially receive negative samples that clicked the ads but did not buy an item; subsequently, some samples among them buy an item then change to positive. In the setting of DF learning, we observe samples over time, then learn a classifier at some point. We initially receive negative samples; subsequently, some samples among them change to positive. This problem is conceivable in various real-world applications such as online advertisements, where the user action takes place long after the first click. Owing to the delayed feedback, naive classification of the positive and negative samples returns a biased classifier. One solution is to use samples that have been observed for more than a certain time window assuming these samples are correctly labeled. However, existing studies reported that simply using a subset of all samples based on the time window assumption does not perform well, and that using all samples along with the time window assumption improves empirical performance. We extend these existing studies and propose a method with the unbiased and convex empirical risk that is constructed from all samples under the time window assumption. To demonstrate the soundness of the proposed method, we provide experimental results on a synthetic and open dataset that is the real traffic log datasets in online advertising.
翻译:我们考虑在延迟反馈(\ emph{DF learning})下培训二进制分类员。例如,在网上广告的转换预测中,我们最初接收的样本是否定的,这些样本点击了广告,但没有买到物品;随后,有些样本购买了一个物品,然后变成正的。在DF学习的设置中,我们随着时间的推移观察样本,然后在某个时候学习一个分类器。我们最初得到的是否定的样本;随后,其中一些样本变为正的。在诸如在线广告等各种现实应用中,可以想象到这个问题,因为用户行动是在第一次点击很久之后发生的。由于延迟的反馈,对正式和负式样本进行天真的分类,结果就是一个偏差的分类器。一个解决办法是使用已经观测到的样品超过一定时间的窗口,假设这些样品有正确的标签。但是,在一段时间内,在一定的时间假设中,我们观察样本的样本组装的样本组装不完;但是,现有的研究报告说,仅仅使用基于时间窗口假设的所有样本组装的样本组装,就能改进经验性业绩。我们扩展了现有的研究,并提出了一种方法,从所有样品在开放的模拟广告上设定的模拟数据。