Among the reasons that hinder the application of reinforcement learning (RL) to real-world problems, two factors are critical: limited data and the mismatch of the testing environment compared to training one. In this paper, we attempt to address these issues simultaneously with the problem setup of distributionally robust offline RL. Particularly, we learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one. Moreover, we consider the linear function approximation to apply the algorithm to large-scale problems. We prove our algorithm can achieve the suboptimality of $O(1/\sqrt{K})$ depending on the linear function dimension $d$, which seems to be the first result with sample complexity guarantee in this setting. Diverse experiments are conducted to demonstrate our theoretical findings, showing the superiority of our algorithm against the non-robust one.
翻译:阻碍将强化学习(RL)应用到现实世界问题的原因中,有两个因素是关键因素:数据有限,测试环境与培训环境不匹配。在本文中,我们试图同时解决这些问题,同时设置分布性强的离线RL的问题。特别是,我们从源环境获得的历史数据中学习一个RL代理,并优化其在受扰动环境中的表现。此外,我们认为线性功能近似可以将算法应用于大规模问题。我们证明我们的算法可以达到线性功能维度为$(1/\sqrt{K})的亚优值,这取决于线性功能维度为$d$($d),这似乎是在这一环境中取得样本复杂性保证的第一个结果。我们进行了多种实验,以展示我们的理论发现,显示我们的算法优于非机器人。