ReLMoGen:利用流动操纵强化学习中生成的动力生成 (ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation)

Many Reinforcement Learning (RL) approaches use joint control signals (positions, velocities, torques) as action space for continuous control tasks. We propose to lift the action space to a higher level in the form of subgoals for a motion generator (a combination of motion planner and trajectory executor). We argue that, by lifting the action space and by leveraging sampling-based motion planners, we can efficiently use RL to solve complex, long-horizon tasks that could not be solved with existing RL methods in the original action space. We propose ReLMoGen -- a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base. These problems are challenging because they are usually long-horizon, hard to explore during training, and comprise alternating phases of navigation and interaction. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. In all settings, ReLMoGen outperforms state-of-the-art Reinforcement Learning and Hierarchical Reinforcement Learning baselines. ReLMoGen also shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.

翻译：许多强化学习(RL)方法使用联合控制信号(位置、速度、矩形)作为连续控制任务的行动空间。我们提议将动作空间提升到更高的水平,作为运动发电机(运动规划器和轨道执行器的组合)的子目标。我们争辩说,通过提升行动空间和利用基于取样的动作规划器,我们可以高效率地使用RL来解决无法在最初行动空间中以现有RL方法解决的复杂、长方位任务。我们提议 ReLMoGen -- -- 将预测次级目标的学习政策与规划和执行达到这些次级目标所需的动作的动作生成器结合起来的框架。为了验证我们的方法,我们应用了RELMOGen来执行两种任务:(1) 交互式导航任务,在需要与环境互动才能到达目的地的地方,以及(2) 移动操纵任务,需要移动机器人基础的操纵任务。这些问题具有挑战性,因为它们通常是长期的,在培训期间难以探索,并且包含导航和动作转换阶段的动作。我们的方法是超越了整个磁性试模型的升级环境。