Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, to date, offline reinforcement learning from has been relatively under-explored, and there is a lack of understanding of where the remaining challenges lie. In this paper, we seek to establish simple baselines for continuous control in the visual domain. We show that simple modifications to two state-of-the-art vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform prior work and establish a competitive baseline. We rigorously evaluate these algorithms on both existing offline datasets and a new testbed for offline reinforcement learning from visual observations that better represents the data distributions present in real-world offline reinforcement learning problems, and open-source our code and data to facilitate progress in this important domain. Finally, we present and analyze several key desiderata unique to offline RL from visual observations, including visual distractions and visually identifiable changes in dynamics.
翻译:离线强化学习在利用大型预先收集的在线数据集进行政策学习方面显示了巨大的希望,使代理商可以放弃常常花费昂贵的在线数据收集。然而,迄今为止,离线强化学习相对探索不足,对其余挑战存在的地方缺乏了解。在本文中,我们力求为视觉领域的连续控制建立简单的基线。我们显示,简单修改两个基于最先进的基于视觉的在线强化学习算法,即DreamerV2和DrQ-v2, 足以超越先前的工作并建立一个竞争性基线。我们严格评价现有离线数据集的这些算法,以及从视觉观测中学习新的离线强化测试台,以更好地反映现实世界离线强化学习问题中的数据分布,以及开源我们的代码和数据,以促进这一重要领域的进展。最后,我们介绍和分析了从视觉观测到的离线 RL(包括视觉分引力和可视动态变化)中一些与离线的离线式关键侧边线缩算法。