This paper presents a new learning framework that leverages the knowledge from imitation learning, deep reinforcement learning, and control theories to achieve human-style locomotion that is natural, dynamic, and robust for humanoids. We proposed novel approaches to introduce human bias, i.e. motion capture data and a special Multi-Expert network structure. We used the Multi-Expert network structure to smoothly blend behavioral features, and used the augmented reward design for the task and imitation rewards. Our reward design is composable, tunable, and explainable by using fundamental concepts from conventional humanoid control. We rigorously validated and benchmarked the learning framework which consistently produced robust locomotion behaviors in various test scenarios. Further, we demonstrated the capability of learning robust and versatile policies in the presence of disturbances, such as terrain irregularities and external pushes.
翻译:本文介绍了一个新的学习框架,利用模仿学习、深强化学习和控制理论的知识,实现人类形态的自然、动态和强健的人类形态运动。我们提出了引入人类偏见的新办法,即运动捕获数据和特殊的多专家网络结构。我们利用多专家网络结构顺利地混合行为特征,并利用强化奖励设计来完成任务和模仿奖赏。我们的奖赏设计是可合成的、可捕捉的,并且可以通过使用传统人类形态控制的基本概念来解释。我们严格验证和确定了学习框架的基准,该框架在各种测试情景中始终产生强有力的移动行为。此外,我们还展示了在发生动乱时学习强有力和多功能政策的能力,例如地形异常和外部推力。