Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from long-term interactions. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. Moreover, we explore the use of off-policy reinforcement learning in multi-session personalized search ranking. Specifically, we design a pairwise Deep Deterministic Policy Gradient model that efficiently captures users' long term reward in terms of pairwise classification error. Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics.
翻译:近些年来,序列建议(SR)引起了大量的兴趣。 序列建议旨在理解和模拟用户的相继行为以及用户和项目之间的相互作用。 令人惊讶的是,尽管序列建议取得了巨大成功,但对于序列搜索(SS)的研究却很少,这是一个双重学习任务,除了历史查询会上的行为外,还考虑到用户当前和以往的查询。 党卫军的学习任务对于大多数电子在线公司来说比对应的斯洛伐克共和国任务更为重要,因为其在线服务需求大得多,而且交通量也大得多。为此,我们提出了一个高度可扩展的混合学习模式,其中包括一个利用短期用户项目互动中所有特征的 RNN学习框架,以及一个利用长期互动中某些项目特性的注意模式。作为一个新颖的优化步骤,我们把多个简短的用户序列放在一个培训模式中的单一的 RNNN通行证中,解决了在飞行上的贪婪的Knapsack问题。 此外,我们探索了在多学期个人化搜索部分中采用非政策强化学习的方法。 具体地说,我们设计了一个长期的模型,用来对每个用户进行大幅度的基线上的改进。