We present a two-staged deep reinforcement learning algorithm for solving challenging control problems. Deep reinforcement learning (deep RL) has been an effective tool for solving many high-dimensional continuous control problems, but it cannot effectively solve challenging problems with certain properties, such as sparse reward functions or sensitive dynamics. In this work, we propose an approach that decomposes the given problem into two stages: motion planning and motion imitation. The motion planning stage seeks to compute a feasible motion plan with approximated dynamics by directly sampling the state space rather than exploring random control signals. Once the motion plan is obtained, the motion imitation stage learns a control policy that can imitate the given motion plan with realistic sensors and actuations. We demonstrate that our approach can solve challenging control problems - rocket navigation and quadrupedal locomotion - which cannot be solved by the standard MDP formulation. The supplemental video can be found at: https://youtu.be/FYLo1Ov_8-g
翻译:深层强化学习(深RL)是解决许多高维连续控制问题的有效工具,但它无法有效解决某些特性的挑战性问题,例如微弱的奖励功能或敏感动态。在这项工作中,我们提议了一种方法,将特定问题分解为两个阶段:运动规划和运动模拟。运动规划阶段试图通过直接取样国家空间而不是探索随机控制信号来计算一个具有近似动态的可行动议计划。一旦获得运动计划,运动模拟阶段就学会了一种控制政策,可以用现实的传感器和动作模仿给定动议计划。我们证明我们的方法可以解决挑战性的控制问题——火箭导航和四轮移动,这些问题无法通过标准的 MDP 公式解决。补充视频可见于: https://youtu.be/FYLO1Ov_8g。