We present a deep reinforcement learning (deep RL) algorithm that consists of learning-based motion planning and imitation to tackle challenging control problems. Deep RL has been an effective tool for solving many high-dimensional continuous control problems, but it cannot effectively solve challenging problems with certain properties, such as sparse reward functions or sensitive dynamics. In this work, we propose an approach that decomposes the given problem into two deep RL stages: motion planning and motion imitation. The motion planning stage seeks to compute a feasible motion plan by leveraging the powerful planning capability of deep RL. Subsequently, the motion imitation stage learns a control policy that can imitate the given motion plan with realistic sensors and actuation models. This new formulation requires only a nominal added cost to the user because both stages require minimal changes to the original problem. We demonstrate that our approach can solve challenging control problems, rocket navigation, and quadrupedal locomotion, which cannot be solved by the monolithic deep RL formulation or the version with Probabilistic Roadmap.
翻译:我们提出了一个深强化学习(深RL)算法,其中包括学习运动规划和模仿,以应对具有挑战性的控制问题。深RL是解决许多高维连续控制问题的有效工具,但它无法有效解决某些特性的挑战性问题,如微弱的奖励功能或敏感动态。在这项工作中,我们提出一种方法,将特定问题分解成两个深RL阶段:运动规划和运动模拟。运动规划阶段试图利用深RL的强大规划能力来计算可行的动议计划。随后,运动模拟阶段学习了一种控制政策,可以以现实的传感器和动作模型来模仿给定的动作计划。这种新配方只需要给用户名义上增加成本,因为两个阶段都需要对原始问题的最小改变。我们证明我们的方法可以解决具有挑战性的控制问题、火箭导航和四分立的移动波波波,而这些问题无法通过单立深度RL的配置或以概率性路线图的版本来解决。</s>