Reinforcement learning based recommender systems (RL-based RS) aims at learning a good policy from a batch of collected data, with casting sequential recommendation to multi-step decision-making tasks. However, current RL-based RS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets, and the trained policy is directly evaluated in the simulation environment. In real-world situations, not all recommendation problems are suitable to be transformed into reinforcement learning problems. Unlike previous academic RL researches, RL-based RS suffer from extrapolation error and the difficulties of being well validated before deployment. In this paper, we introduce the RL4RS (Reinforcement Learning for Recommender Systems) benchmark - a new resource fully collected from industrial applications to train and evaluate RL algorithms with special concerns on the above issues. It contains two datasets, tuned simulation environments, related advanced RL baselines, data understanding tools, and counterfactual policy evaluation algorithms. The RL4RS suit can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in reinforcement learning and neural combinatorial optimization.
翻译:在现实环境中,并非所有建议问题都适合转化为强化学习问题。与以往的学术研究不同,基于RL的RS都存在外推错误和在部署前难以充分验证的问题。在本文件中,我们引入了RL4RS(强化建议系统学习)基准——从工业应用中充分收集的新资源,用于培训和评估具有上述特别关切的RL算法。它包含两个数据集、调整的模拟环境、相关的高级RL基线、数据理解工具以及反事实政策评估算法。除了https://github.com/fuAIlab/REARS外,还可以找到RL4RS诉讼。