通过推荐人系统中的强化学习进行序列适应 (Sequence Adaptation via Reinforcement Learning in Recommender Systems)

Accounting for the fact that users have different sequential patterns, the main drawback of state-of-the-art recommendation strategies is that a fixed sequence length of user-item interactions is required as input to train the models. This might limit the recommendation accuracy, as in practice users follow different trends on the sequential recommendations. Hence, baseline strategies might ignore important sequential interactions or add noise to the models with redundant interactions, depending on the variety of users' sequential behaviours. To overcome this problem, in this study we propose the SAR model, which not only learns the sequential patterns but also adjusts the sequence length of user-item interactions in a personalized manner. We first design an actor-critic framework, where the RL agent tries to compute the optimal sequence length as an action, given the user's state representation at a certain time step. In addition, we optimize a joint loss function to align the accuracy of the sequential recommendations with the expected cumulative rewards of the critic network, while at the same time we adapt the sequence length with the actor network in a personalized manner. Our experimental evaluation on four real-world datasets demonstrates the superiority of our proposed model over several baseline approaches. Finally, we make our implementation publicly available at https://github.com/stefanosantaris/sar.

翻译：考虑到用户有不同的顺序模式,最新建议战略的主要缺点是,需要将用户-项目互动的固定顺序长度作为培训模型的投入。这可能会限制建议准确性,因为在实践中,用户遵循的是顺序建议的不同趋势。因此,基线战略可能忽视重要的顺序互动,或根据用户的顺序行为的不同,在模式中增加噪音,产生重复互动。为了克服这一问题,我们在本研究中提议SAR模式,该模式不仅学习顺序模式,而且以个性化的方式调整用户-项目互动的顺序长度。我们首先设计一个行为者-critic框架,其中RL代理商试图将最佳序列长度作为行动进行计算,因为用户在一定的时间内的状态代表情况。此外,我们优化联合损失功能,使顺序建议的准确性与批评网络的预期累积回报相一致,同时我们以个性化的方式调整与行为者网络的顺序长度。我们对四个实体-项目数据设置的实验性评估显示我们提议的模型优于若干基线方法。最后,我们将公开进行我们提议的模型的落实。