We propose a simple but powerful data-driven framework for solving highly challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, training strategies, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of highly challenging hand manipulation tasks with sparse reward and realistic visual inputs, our framework learns 370%-1200% faster than the previous SOTA method while using an encoder that is 50 times smaller, fully demonstrating the potential of data-driven deep reinforcement learning.
翻译:我们提出一个简单而有力的数据驱动框架,以解决具有高度挑战性的视觉深层强化学习(DRL)任务。我们分析了在采取数据驱动方法方面的一些主要障碍,并提出了一套设计原则、培训战略和关于数据驱动视觉强化学习(DRL)的批判性见解。我们的框架有三个阶段:在第一阶段,我们利用非RL数据集(例如图像网)学习任务不可知的视觉表现;在第二阶段,我们使用离线RL数据(例如有限的专家演示)将任务不可知性表现转换为更强大的任务特定表现;在第三阶段,我们用在线RL微调该代理。 在一套极具挑战性的手控任务中,我们的框架以微薄的奖励和现实的视觉投入,学习率比以前的SOTA方法快370%-1200%,同时使用50倍的编码器,充分展示数据驱动深度强化学习的潜力。