Recommender systems (RSs) have become an inseparable part of our everyday lives. They help us find our favorite items to purchase, our friends on social networks, and our favorite movies to watch. Traditionally, the recommendation problem was considered to be a classification or prediction problem, but it is now widely agreed that formulating it as a sequential decision problem can better reflect the user-system interaction. Therefore, it can be formulated as a Markov decision process (MDP) and be solved by reinforcement learning (RL) algorithms. Unlike traditional recommendation methods, including collaborative filtering and content-based filtering, RL is able to handle the sequential, dynamic user-system interaction and to take into account the long-term user engagement. Although the idea of using RL for recommendation is not new and has been around for about two decades, it was not very practical, mainly because of scalability problems of traditional RL algorithms. However, a new trend has emerged in the field since the introduction of deep reinforcement learning (DRL), which made it possible to apply RL to the recommendation problem with large state and action spaces. In this paper, a survey on reinforcement learning based recommender systems (RLRSs) is presented. Our aim is to present an outlook on the field and to provide the reader with a fairly complete knowledge of key concepts of the field. We first recognize and illustrate that RLRSs can be generally classified into RL- and DRL-based methods. Then, we propose an RLRS framework with four components, i.e., state representation, policy optimization, reward formulation, and environment building, and survey RLRS algorithms accordingly. We highlight emerging topics and depict important trends using various graphs and tables. Finally, we discuss important aspects and challenges that can be addressed in the future.
翻译:推荐系统(RSs)已成为我们日常生活中不可分割的一部分。 它们帮助我们找到我们最喜爱的购买项目、社交网络上的朋友和我们最喜欢看的电影。 传统上,建议问题被视为分类或预测问题, 但现在人们普遍同意, 将建议问题作为一个顺序决定问题制定起来可以更好地反映用户-系统的互动。 因此, 它可以作为一个Markov 决策程序(MDP), 并通过强化学习算法加以解决。 传统建议方法不同, 包括合作过滤和基于内容的过滤, RL 能够处理连续、动态的用户-系统互动和考虑长期用户参与。 传统上, 建议问题被视为分类问题或预测问题, 但现在, 将建议性RL 的概念设计为不新鲜, 传统 RL 算法的可缩放问题自引入深入强化学习( DRL ) 以来出现了一种新的趋势, 使得RL 将建议调查的 RL 和 RRR 系统 用于大的目标和行动空间 。 在本文中, 强化学习的 RL 最终 将 定义 和 RL 的系统 向实地 展示一个新的方向 。