An ultimate goal of recommender systems (RS) is to improve user engagement. Reinforcement learning (RL) is a promising paradigm for this goal, as it directly optimizes overall performance of sequential recommendation. However, many existing RL-based approaches induce huge computational overhead, because they require not only the recommended items but also all other candidate items to be stored. This paper proposes an efficient alternative that does not require the candidate items. The idea is to model the correlation between user engagement and items directly from data. Moreover, the proposed approach consider randomness in user feedback and termination behavior, which are ubiquitous for RS but rarely discussed in RL-based prior work. With online A/B experiments on real-world RS, we confirm the efficacy of the proposed approach and the importance of modeling the two types of randomness.
翻译:-
建立长期用户参与度的随机反馈模型
摘要:
此论文的终极目标是提高推荐系统(RS)的用户参与度。强化学习(RL)是实现此目标的有前途的范例,因为它直接优化序列推荐的整体表现。然而,许多现有的RL-based方法会引发巨大的计算开销,因为它们不仅需要存储推荐的项目,还需要存储所有其他候选项目。本文提出了一种高效的替代方法,不需要候选项。其思想是直接从数据中模拟用户参与度和项目之间的相关性。此外,所提出的方法考虑到用户反馈和终止行为中的随机性,在RS中随处可见,但在基于RL的先前工作中很少讨论。通过对真实世界RS的在线A/B实验,我们确认了所提出的方法的功效以及模拟这两种随机性的重要性。