Reinforcement learning based recommender systems (RL-based RS) aim at learning a good policy from a batch of collected data, by casting sequential recommendations to multi-step decision-making tasks. However, current RL-based RS benchmarks commonly have a large reality gap, because they involve artificial RL datasets or semi-simulated RS datasets, and the trained policy is directly evaluated in the simulation environment. In real-world situations, not all recommendation problems are suitable to be transformed into reinforcement learning problems. Unlike previous academic RL research, RL-based RS suffers from extrapolation error and the difficulties of being well-validated before deployment. In this paper, we introduce the RL4RS (Reinforcement Learning for Recommender Systems) benchmark - a new resource fully collected from industrial applications to train and evaluate RL algorithms with special concerns on the above issues. It contains two datasets, tuned simulation environments, related advanced RL baselines, data understanding tools, and counterfactual policy evaluation algorithms. The RL4RS suit can be found at https://github.com/fuxiAIlab/RL4RS. In addition to the RL-based recommender systems, we expect the resource to contribute to research in reinforcement learning and neural combinatorial optimization.
翻译:强化基于学习的推荐系统(基于RL的推荐系统)旨在从收集的一组数据中学习一项好的政策,方法是向多阶段决策任务提出顺序建议;然而,目前基于RL的RS基准通常存在巨大的现实差距,因为这些基准涉及人工RL数据集或半模拟RS数据集,经过培训的政策在模拟环境中得到直接评价。在现实世界中,并非所有建议问题都适合转化为强化学习问题。与以往学术RL的研究不同,基于RL的RS都存在外推错误和在部署前难以充分验证的问题。在本文件中,我们引入了RL4RS基准(加强咨询系统学习系统)——从工业应用中充分收集的新资源,用于培训和评价RLL的算法,其中含有两个数据集、调整的模拟环境、相关的高级RL基线、数据理解工具以及反事实政策评价算法。在基于 https://github.com/fuxAIlab 和RELRRARES的升级系统中,为RELRA/RERA的升级系统提供新的学习建议。