Translated Title: 因果决策变压器在离线强化学习中的应用于推荐系统 Translated Abstract: 基于强化学习的推荐系统最近变得越来越受欢迎。然而，奖励函数的设计，是指代理程序依赖于以优化其推荐策略作为导向的奖励函数，往往并不简单。探索用户行为背后的因果关系，可以代替奖励函数，指导代理程序捕捉用户的动态兴趣。此外，由于模拟环境的典型限制（例如数据效率），大多数工作并不适用于大规模情况。虽然一些工作尝试将离线数据集转换为模拟器，但数据效率使学习过程变得更慢。由于强化学习的性质（即通过交互学习），单次交互不足以收集足够的数据以进行训练。此外，传统的强化学习算法没有像监督学习方法那样从离线数据集中直接学习的扎实能力。在本文中，我们提出了一个称为因果决策变压器的新模型，用于推荐系统（CDT4Rec）。CDT4Rec是一种离线强化学习系统，可以从数据集中学习，而不是从在线交互中学习。此外，CDT4Rec采用了变压器架构，能够处理大型离线数据集，并捕捉数据中的短期和长期依赖性，以估计动作、状态和奖励之间的因果关系。为了展示我们模型的可行性和优越性，我们在六个真实离线数据集和一个在线模拟器上进行了实验。 (Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning)

翻译：Translated Title: 因果决策变压器在离线强化学习中的应用于推荐系统 Translated Abstract: 基于强化学习的推荐系统最近变得越来越受欢迎。然而，奖励函数的设计，是指代理程序依赖于以优化其推荐策略作为导向的奖励函数，往往并不简单。探索用户行为背后的因果关系，可以代替奖励函数，指导代理程序捕捉用户的动态兴趣。此外，由于模拟环境的典型限制（例如数据效率），大多数工作并不适用于大规模情况。虽然一些工作尝试将离线数据集转换为模拟器，但数据效率使学习过程变得更慢。由于强化学习的性质（即通过交互学习），单次交互不足以收集足够的数据以进行训练。此外，传统的强化学习算法没有像监督学习方法那样从离线数据集中直接学习的扎实能力。在本文中，我们提出了一个称为因果决策变压器的新模型，用于推荐系统（CDT4Rec）。CDT4Rec是一种离线强化学习系统，可以从数据集中学习，而不是从在线交互中学习。此外，CDT4Rec采用了变压器架构，能够处理大型离线数据集，并捕捉数据中的短期和长期依赖性，以估计动作、状态和奖励之间的因果关系。为了展示我们模型的可行性和优越性，我们在六个真实离线数据集和一个在线模拟器上进行了实验。

Siyu Wang,Xiaocong Chen,Dietmar Jannach,Lina Yao

Reinforcement learning-based recommender systems have recently gained popularity. However, the design of the reward function, on which the agent relies to optimize its recommendation policy, is often not straightforward. Exploring the causality underlying users' behavior can take the place of the reward function in guiding the agent to capture the dynamic interests of users. Moreover, due to the typical limitations of simulation environments (e.g., data inefficiency), most of the work cannot be broadly applied in large-scale situations. Although some works attempt to convert the offline dataset into a simulator, data inefficiency makes the learning process even slower. Because of the nature of reinforcement learning (i.e., learning by interaction), it cannot collect enough data to train during a single interaction. Furthermore, traditional reinforcement learning algorithms do not have a solid capability like supervised learning methods to learn from offline datasets directly. In this paper, we propose a new model named the causal decision transformer for recommender systems (CDT4Rec). CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction. Moreover, CDT4Rec employs the transformer architecture, which is capable of processing large offline datasets and capturing both short-term and long-term dependencies within the data to estimate the causal relationship between action, state, and reward. To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.

翻译：因果决策变压器在离线强化学习中的应用于推荐系统