Click-through rate(CTR) prediction is a core task in cost-per-click(CPC) advertising systems and has been studied extensively by machine learning practitioners. While many existing methods have been successfully deployed in practice, most of them are built upon i.i.d.(independent and identically distributed) assumption, ignoring that the click data used for training and inference is collected through time and is intrinsically non-stationary and drifting. This mismatch will inevitably lead to sub-optimal performance. To address this problem, we formulate CTR prediction as a continual learning task and propose COLF, a hybrid COntinual Learning Framework for CTR prediction, which has a memory-based modular architecture that is designed to adapt, learn and give predictions continuously when faced with non-stationary drifting click data streams. Married with a memory population method that explicitly controls the discrepancy between memory and target data, COLF is able to gain positive knowledge from its historical experience and makes improved CTR predictions. Empirical evaluations on click log collected from a major shopping app in China demonstrate our method's superiority over existing methods. Additionally, we have deployed our method online and observed significant CTR and revenue improvement, which further demonstrates our method's efficacy.
翻译:点击率(CTR)预测是成本-每点击(CPC)广告系统的一项核心任务,而且机器学习实践者已经广泛研究过。虽然许多现有方法在实践中已经成功运用,但大多数现有方法都是基于i.d.d.(独立和完全分布)假设,忽略了用于培训和推断的点击数据是及时收集的,而且本质上是非静止和漂移的。这种不匹配将不可避免地导致低于最佳业绩。为了解决这个问题,我们将CTR预测作为一项持续学习的任务,并提出CTR预测的混合COLF,即CTR预测的混合COLF,这是一个基于记忆的模块架构,目的是在面临非静止漂流时不断调整、学习和预测。与明确控制记忆与目标数据差异的记忆群方法相结合,COLF能够从历史经验中获取积极的知识,并改进CTR预测。在点击中国主要购物应用程序收集的日志时,我们从现有方法上展示了我们的方法的优势。此外,我们运用了在线方法,并观察到了我们显著的收益。