We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.
翻译:我们提出了VRL3,这是一个功能强大的基于数据的框架,具有简单的设计,可用于解决具有挑战性的视觉深度强化学习任务。我们分析了采用基于数据的方法面临的主要障碍,并提出了一系列设计原则、新发现和关于数据驱动的视觉深度强化学习的关键见解。我们的框架分为三个阶段:第一阶段中,我们利用非强化学习数据集(例如ImageNet)来学习任务无关的视觉表示;第二阶段中,我们使用离线强化学习数据(例如有限数量的专家示范)将任务无关表示转化为更强大的任务特定表示;第三阶段中,我们使用在线强化学习来对智能体进行微调。在具有稀疏奖励和逼真视觉输入的一组具有挑战性的手部操作任务上,与之前的SOTA相比,VRL3的平均样本效率提高了780%。在最困难的任务上,VRL3的样本效率提高了1220% (使用更宽编码器时为2440%),且只需计算量的10%即可解决该任务。这些显着的结果清楚地表明了基于数据的深度强化学习的巨大潜力。