With the widespread adoption of deep learning, reinforcement learning (RL) has experienced a dramatic increase in popularity, scaling to previously intractable problems, such as playing complex games from pixel observations, sustaining conversations with humans, and controlling robotic agents. However, there is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment. Offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions, making it feasible to extract policies from large and diverse training datasets. Effective offline RL algorithms have a much wider range of applications than online RL, being particularly appealing for real-world applications, such as education, healthcare, and robotics. In this work, we contribute with a unifying taxonomy to classify offline RL methods. Furthermore, we provide a comprehensive review of the latest algorithmic breakthroughs in the field using a unified notation as well as a review of existing benchmarks' properties and shortcomings. Additionally, we provide a figure that summarizes the performance of each method and class of methods on different dataset properties, equipping researchers with the tools to decide which type of algorithm is best suited for the problem at hand and identify which classes of algorithms look the most promising. Finally, we provide our perspective on open problems and propose future research directions for this rapidly growing field.
翻译:随着深度学习的广泛应用,强化学习(RL)在使用像素观测对复杂游戏进行游戏,维持与人类的对话以及控制机器人代理等原本棘手的问题上经历了显著的增长。然而,仍有许多领域无法通过RL访问,因为与环境的交互成本高昂且危险。离线RL是一种仅从以前收集的相互作用的静态数据集中学习的模式,使从大型和多样化的训练数据集中提取策略成为可能。与在线RL相比,有效的离线RL算法具有更广泛的应用范围,特别适用于现实世界的应用,如教育、医疗保健和机器人领域。在本文中,我们提出了一个统一的分类法来对离线RL方法进行分类。此外,我们使用统一符号对该领域最新的算法突破进行全面的评论,以及现有基准性能的评论,并提供概括各种方法和方法类在不同数据集属性上表现的图表,为研究人员提供工具来决定哪种类型的算法最适合处理手头的问题并确定哪些算法类看起来最有前途。最后,我们提供了我们对于开放问题的看法,并提出了这个快速增长领域的未来研究方向。