Learning various motor skills for quadrupedal robots is a challenging problem that requires careful design of task-specific mathematical models or reward descriptions. In this work, we propose to learn a single capable policy using deep reinforcement learning by imitating a large number of reference motions, including walking, turning, pacing, jumping, sitting, and lying. On top of the existing motion imitation framework, we first carefully design the observation space, the action space, and the reward function to improve the scalability of the learning as well as the robustness of the final policy. In addition, we adopt a novel adaptive motion sampling (AMS) method, which maintains a balance between successful and unsuccessful behaviors. This technique allows the learning algorithm to focus on challenging motor skills and avoid catastrophic forgetting. We demonstrate that the learned policy can exhibit diverse behaviors in simulation by successfully tracking both the training dataset and out-of-distribution trajectories. We also validate the importance of the proposed learning formulation and the adaptive motion sampling scheme by conducting experiments.
翻译:学习四足机器人的各种运动技能是一个具有挑战性的问题,需要精心设计任务特定的数学模型或奖励描述。在这项工作中,我们提出使用深度强化学习通过模仿大量参考动作(包括行走、转弯、踱步、跳跃、坐下和躺下等)来学习单个有能力的策略。在现有动作模仿框架的基础上,我们首先仔细设计了观察空间、动作空间和奖励函数,以提高学习的可扩展性和最终策略的鲁棒性。此外,我们采用了一种新颖的自适应运动采样(AMS)方法,它保持成功和不成功行为之间的平衡。该技术允许学习算法集中于挑战性运动技能,并避免灾难性遗忘。我们证明,通过成功跟踪训练数据集和分布外轨迹,学习到的策略可以在仿真环境中表现出各种行为。我们还通过实验验证了所提出的学习公式和自适应运动采样方案的重要性。