Reinforcement learning based recommender systems (RL-based RS) aims at learning a good policy from a batch of collected data, by casting sequential recommendation to multi-step decision-making tasks. However, current RL-based RS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets, and the trained policy is directly evaluated in the simulation environment. In real-world situations, not all recommendation problems are suitable to be transformed into reinforcement learning problems. Unlike previous academic RL research, RL-based RS suffers from extrapolation error and the difficulties of being well-validated before deployment. In this paper, we introduce the RL4RS (Reinforcement Learning for Recommender Systems) benchmark - a new resource fully collected from industrial applications to train and evaluate RL algorithms with special concerns on the above issues. It contains two datasets, tuned simulation environments, related advanced RL baselines, data understanding tools, and counterfactual policy evaluation algorithms. The RL4RS suit can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in reinforcement learning and neural combinatorial optimization.
翻译:在现实环境中,并非所有建议问题都适合转化为强化学习问题。与以往的学术研究不同,基于RL的RS都存在外推错误,而且难以在部署前得到充分验证。在本文件中,我们引入了RL4RS(加强咨询系统学习)基准——从工业应用中充分收集的新资源,用于培训和评估具有上述特别关切的RL算法。它包含两个数据集、调整的模拟环境、相关的高级RL基线、数据理解工具和反事实政策评价算法。