与进化轨轨迹生成器加强学习:四步移动一般方法 (Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion)

Recently reinforcement learning (RL) has emerged as a promising approach for quadrupedal locomotion, which can save the manual effort in conventional approaches such as designing skill-specific controllers. However, due to the complex nonlinear dynamics in quadrupedal robots and reward sparsity, it is still difficult for RL to learn effective gaits from scratch, especially in challenging tasks such as walking over the balance beam. To alleviate such difficulty, we propose a novel RL-based approach that contains an evolutionary foot trajectory generator. Unlike prior methods that use a fixed trajectory generator, the generator continually optimizes the shape of the output trajectory for the given task, providing diversified motion priors to guide the policy learning. The policy is trained with reinforcement learning to output residual control signals that fit different gaits. We then optimize the trajectory generator and policy network alternatively to stabilize the training and share the exploratory data to improve sample efficiency. As a result, our approach can solve a range of challenging tasks in simulation by learning from scratch, including walking on a balance beam and crawling through the cave. To further verify the effectiveness of our approach, we deploy the controller learned in the simulation on a 12-DoF quadrupedal robot, and it can successfully traverse challenging scenarios with efficient gaits.

翻译：最近,强化学习(RL)已成为四重运动的有希望的办法,可以节省在设计特定技能控制器等常规方法中的手工工作。然而,由于四重机器人的复杂非线性动态和奖励宽度,RL仍难以从零开始学到有效的步数,特别是在诸如在平衡光束上行走等具有挑战性的任务中。为了减轻这种困难,我们提议了一个新的基于RL的基于RL的方法,其中包含一个进化轨道轨道生成器。与以前使用固定轨道生成器的方法不同,发电机不断优化特定任务产出轨迹的形状,在指导政策学习之前提供多样化的动作。该政策经过培训后,将强化学习以输出适合不同阵数的剩余控制信号。然后我们优化轨道生成器和政策网络,以稳定培训并分享探索性数据来提高样品效率。结果是,我们的方法可以通过从零开始学习,包括走在平衡上行走和爬过洞洞,来解决一系列具有挑战性的任务。为了进一步核实我们的方法的有效性,我们将控制器运用在12-Do 象模型中成功学习了具有挑战性的机器人。