CTR预测的长期用户行为模型 (Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction)

Rich user behavior data has been proven to be of great value for Click-Through Rate (CTR) prediction applications, especially in industrial recommender, search, or advertising systems. However, it's non-trivial for real-world systems to make full use of long-term user behaviors due to the strict requirements of online serving time. Most previous works adopt the retrieval-based strategy, where a small number of user behaviors are retrieved first for subsequent attention. However, the retrieval-based methods are sub-optimal and would cause more or less information losses, and it's difficult to balance the effectiveness and efficiency of the retrieval algorithm. In this paper, we propose SDIM (Sampling-based Deep Interest Modeling), a simple yet effective sampling-based end-to-end approach for modeling long-term user behaviors. We sample from multiple hash functions to generate hash signatures of the candidate item and each item in the user behavior sequence, and obtain the user interest by directly gathering behavior items associated with the candidate item with the same hash signature. We show theoretically and experimentally that the proposed method performs on par with standard attention-based models on modeling long-term user behaviors, while being sizable times faster. We also introduce the deployment of SDIM in our system. Specifically, we decouple the behavior sequence hashing, which is the most time-consuming part, from the CTR model by designing a separate module named BSE (behavior Sequence Encoding). BSE is latency-free for the CTR server, enabling us to model extremely long user behaviors. Both offline and online experiments are conducted to demonstrate the effectiveness of SDIM. SDIM now has been deployed online in the search system of Meituan APP.

翻译：丰富的用户行为数据已被证明对于点击浏览率(CTR)预测应用,特别是在工业推荐、搜索或广告系统中,具有巨大的价值。然而,由于对在线服务时间的严格要求,对于真实世界系统来说,这些数据对于充分利用长期用户行为是非边际的。大多数以前的工作都采用了基于检索的战略,其中少数用户行为首先被检索以供随后注意。然而,基于检索的方法是次优化的,将造成或多或少的信息损失,而且很难平衡检索算法的效能和效率。在本文件中,我们提议SDIM(基于抽样的深利模型模型),这是一个简单而有效的基于抽样的端对端方法,以便充分利用长期用户行为模型。我们从多个基于散列函数的功能中提取候选项目的签名,以及用户行为序列中的每个项目,通过直接收集与候选项目相关的行为项目来获得用户兴趣。我们从理论上和实验性地展示了SDIM模型, 以最接近的时间模型进行在线的S-S-IM, 以最快速的S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-L-S-S-L-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-L-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-