The paper presents a complete pipeline for learning continuous motion control policies for a mobile robot when only a non-differentiable physics simulator of robot-terrain interactions is available. The multi-modal state estimation of the robot is also complex and difficult to simulate, so we simultaneously learn a generative model which refines simulator outputs. We propose a coarse-to-fine learning paradigm, where the coarse motion planning is alternated with imitation learning and policy transfer to the real robot. The policy is jointly optimized with the generative model. We evaluate the method on a real-world platform in a batch of experiments.
翻译:本文为学习移动机器人的连续运动控制政策提供了一个完整的管道,只有具备机器人-地形相互作用的无差别物理模拟器时,才能学习移动机器人的连续运动控制政策。对机器人的多模式状态估计也是复杂和难以模拟的,因此我们同时学习一种改良模拟输出的基因模型。我们提出了一个粗略至软体学习模式,将粗体运动规划与仿造学习和政策转换到真正的机器人。该政策与基因模型共同优化。我们在一系列实验中对现实世界平台上的方法进行了评估。