Sequential recommendation holds the promise of being able to infer user preference from the history information. Existing methods mostly assume coherent user preference in the history information, and deploy a unified model to predict the next behavior. However, user preferences are naturally diverse, and different users may enjoy their own personalities, which makes the history information mixed of heterogeneous user preferences. Inspired by this practical consideration, in this paper, we proposed a novel sequential recommender model by disentangling different user preferences. The main building block of our idea is a behavior allocator, which determines how many sub-sequences the history information should be decomposed into, and how to allocate each item into these sub-sequences. In particular, we regard the disentanglement of user preference as a Markov decision process, and design a reinforcement learning method to implement the behavior allocator. The reward in our model is designed to assign the target item to the nearest sub-sequence, and simultaneously encourage orthogonality between the generated sub-sequences. To make the disentangled sub-sequences not too sparse, we introduce a curriculum reward, which adaptively penalizes the action of creating a new sub-sequence. We conduct extensive experiments based on real-world datasets, and compare with many state-of-the-art models to verify the effectiveness of our model. Empirical studies manifest that our model can on average improve the performance by about 7.42$\%$ and 11.98$\%$ on metrics NDCG and MRR, respectively.
翻译:序列建议有希望能够从历史信息中推断用户偏好。 现有方法大多假定历史信息中用户偏好一致, 并使用统一的模型来预测下一个行为。 然而, 用户偏好自然是多种多样的, 不同的用户可能享受自己的个性, 这使得历史信息混合了不同用户的偏好。 受此实际考虑的启发, 我们在本文件中提出了一个新颖的顺序建议模式, 不区分不同的用户偏好。 我们的想法的主要构件是一个行为分配器, 它决定了多少次序列的历史信息应该解析, 以及如何将每个项目分配到这些次序列中。 我们特别把用户偏好的分解视为马尔科夫决定程序, 并设计一个强化学习方法来实施行为偏好。 我们模型的奖励是将目标项目分配给最近的子序列, 同时鼓励产生子序列的模型或偏差性。 使不相交的次序列中最相交错的次序列信息解的次序列, 以及如何将每个项目分解的次序列分配到这些次序列中。 我们分别将用户偏差的偏好的用户偏好作为Mark G值的学习过程,, 我们将用一个更深的成绩的实验, 来进行一个真正的实验。