This paper proposes a modular framework to generate robust biped locomotion using a tight coupling between an analytical walking approach and deep reinforcement learning. This framework is composed of six main modules which are hierarchically connected to reduce the overall complexity and increase its flexibility. The core of this framework is a specific dynamics model which abstracts a humanoid's dynamics model into two masses for modeling upper and lower body. This dynamics model is used to design an adaptive reference trajectories planner and an optimal controller which are fully parametric. Furthermore, a learning framework is developed based on Genetic Algorithm (GA) and Proximal Policy Optimization (PPO) to find the optimum parameters and to learn how to improve the stability of the robot by moving the arms and changing its center of mass (COM) height. A set of simulations are performed to validate the performance of the framework using the official RoboCup 3D League simulation environment. The results validate the performance of the framework, not only in creating a fast and stable gait but also in learning to improve the upper body efficiency.
翻译:本文提出一个模块化框架,利用分析步行方法和深层强化学习之间的紧密结合,形成一个强大的双向移动,以产生强大的双向移动。这个框架由六个主要模块组成,这些模块在等级上相互连接,以减少整体复杂性并提高其灵活性。这个框架的核心是一个特定的动态模型,将一个人类的动态模型转换成两个质量,以模拟上下体。这个动态模型用于设计一个适应性参考轨迹规划仪和一个完全对准的最佳控制器。此外,一个学习框架是以遗传阿尔戈里希姆(GA)和Proximal政策优化(PPO)为基础的,以寻找最佳参数,并学习如何通过移动武器并改变其质量高度中心来改善机器人的稳定性。进行一系列模拟,以利用官方的RoboCup 3D联盟模拟环境验证框架的性能。结果验证了框架的性能,不仅在创建快速稳定的网格方面,而且在学习提高上体效率方面。