基于强化学习的推荐系统:一项调查 (Reinforcement learning based recommender systems: A survey)

Recommender systems (RSs) are becoming an inseparable part of our everyday lives. They help us find our favorite items to purchase, our friends on social networks, and our favorite movies to watch. Traditionally, the recommendation problem was considered as a simple classification or prediction problem; however, the sequential nature of the recommendation problem has been shown. Accordingly, it can be formulated as a Markov decision process (MDP) and reinforcement learning (RL) methods can be employed to solve it. In fact, recent advances in combining deep learning with traditional RL methods, i.e. deep reinforcement learning (DRL), has made it possible to apply RL to the recommendation problem with massive state and action spaces. In this paper, a survey on reinforcement learning based recommender systems (RLRSs) is presented. We first recognize the fact that algorithms developed for RLRSs can be generally classified into RL- and DRL-based methods. Then, we present these RL- and DRL-based methods in a classified manner based on the specific RL algorithm, e.g., Q-learning, SARSA, and REINFORCE, that is used to optimize the recommendation policy. Furthermore, some tables are presented that contain detailed information about the MDP formulation of these methods, as well as about their evaluation schemes. Finally, we discuss important aspects and challenges that can be addressed in the future.

翻译：推荐系统(RSs)正在成为我们日常生活不可分割的一部分。它们帮助我们找到我们最喜欢购买的物品、社交网络上的朋友,以及我们最喜欢看的电影。传统上,建议问题被视为简单的分类或预测问题; 但是,建议问题的顺序性质已经显示出来。因此,可以将建议系统发展成一个Markov 决策程序(MDP),加强学习方法(RL)来解决这个问题。事实上,在将深层次学习与传统的学习方法(即深层强化学习(DRL)相结合方面最近取得的进展,使我们得以将RL应用于具有巨大状态和行动空间的建议问题。在本文中,对基于推荐系统(RLRS)的强化学习进行调查。我们首先认识到,为RLRS开发的算法一般可以分为基于RL和DRL的方法。然后,我们根据具体的RL算法(例如,Q学习、SA)和REINFORCE(REINFDP)的分类方法, 使得我们得以应用RLL方法来进行分类, 并详细讨论这些方法的制定。最后,我们用这些方法作为最优化的表格来讨论。