Rich user behavior data has been proven to be of great value for Click-Through Rate (CTR) prediction applications, especially in industrial recommender, search, or advertising systems. However, it's non-trivial for real-world systems to make full use of long-term user behaviors due to the strict requirements of online serving time. Most previous works adopt the retrieval-based strategy, where a small number of user behaviors are retrieved first for subsequent attention. However, the retrieval-based methods are sub-optimal and would cause more or less information losses, and it's difficult to balance the effectiveness and efficiency of the retrieval algorithm. In this paper, we propose \textbf{SDIM} (\textbf{S}ampling-based \textbf{D}eep \textbf{I}nterest \textbf{M}odeling), a simple yet effective sampling-based end-to-end approach for modeling long-term user behaviors. We sample from multiple hash functions to generate hash signatures of the candidate item and each item in the user behavior sequence, and obtain the user interest by directly gathering behavior items associated with the candidate item with the same hash signature. We show theoretically and experimentally that the proposed method performs on par with standard attention-based models on modeling long-term user behaviors, while being sizable times faster. We also introduce the deployment of SDIM in our system. Specifically, we decouple the behavior sequence hashing, which is the most time-consuming part, from the CTR model by designing a separate module named BSE (behavior Sequence Encoding). BSE is latency-free for the CTR server, enabling us to model extremely long user behaviors. Both offline and online experiments are conducted to demonstrate the effectiveness of SDIM. SDIM now has been deployed online in the search system of Meituan APP.
翻译:丰富的用户行为数据已被证明对于点击浏览率( CTR) 预测应用, 特别是在工业推荐、 搜索或广告系统中, 具有巨大的价值 。 然而, 由于对在线服务时间的严格要求, 它对于真实世界的系统来说是非三重的, 以便充分利用长期用户行为。 大多数前的工作都采用了基于检索的战略, 其中少量的用户行为首先被检索到随后注意。 然而, 以检索为基础的方法是次最佳的, 并会多少造成信息损失, 并且很难平衡检索算法的效能和效率 。 但是, 在本文中, 我们提议\ textbf{ SDIM} ( textbf{ D} ) 标注基于网络服务的长期用户行为 。 使用基于抽样的终端到终端方法来模拟长期用户行为 。 我们从多功能中提取候选人项目和每个项目的签名 。 在用户行为动作中, 我们用SMI 服务器最快速的运行方式, 我们用SDA 运行了在线的服务器行为 。 我们用SDA 运行 运行了在线 的服务器 运行模式 。 我们用SD 运行 运行 运行 运行 运行, 运行 运行 运行 运行 运行 运行, 运行 运行 以 运行 以 运行 运行 运行 以 以 运行 以 运行 运行 运行 运行 运行 以 以 以 运行 运行 运行 运行 运行 运行 运行 运行 运行 的 运行 运行 运行 运行 运行 运行 以 进行 运行 的 的 。 运行 运行 运行 运行 运行 。 运行 运行 运行 运行 。