Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, to date, offline reinforcement learning from visual observations with continuous action spaces has been relatively under-explored, and there is a lack of understanding of where the remaining challenges lie. In this paper, we seek to establish simple baselines for continuous control in the visual domain. We show that simple modifications to two state-of-the-art vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform prior work and establish a competitive baseline. We rigorously evaluate these algorithms on both existing offline datasets and a new testbed for offline reinforcement learning from visual observations that better represents the data distributions present in real-world offline RL problems, and open-source our code and data to facilitate progress in this important domain. Finally, we present and analyze several key desiderata unique to offline RL from visual observations, including visual distractions and visually identifiable changes in dynamics.
翻译:离线强化学习在利用大型预先收集的在线强化数据集进行政策学习方面显示了巨大的希望,使代理商能够放弃常常花费昂贵的在线数据收集。然而,迄今为止,从连续行动空间的视觉观测中进行的离线强化学习相对没有得到充分的探索,对其余挑战存在的地方也缺乏了解。在本文中,我们力求为持续控制视觉领域建立简单的基线。我们展示了对两个最先进的基于视觉的在线强化强化在线学习算法,即DreamerV2和DrQ-v2的简单修改,这足以超越先前的工作并建立竞争性基线。我们严格评价现有离线数据集的这些算法,以及从视觉观测中进行离线强化学习的新的测试台,以更好地反映现实世界离线RL问题中的数据分布,以及开源代码和数据,以促进这一重要领域的进展。最后,我们介绍和分析了一些离线 RL所独有的关键侧面,包括视觉分流和可视动态变化。