Deep reinforcement learning (deep RL) has emerged as an effective tool for developing controllers for legged robots. However, vanilla deep RL often requires a tremendous amount of training samples and is not feasible for achieving robust behaviors. Instead, researchers have investigated a novel policy architecture by incorporating human experts' knowledge, such as Policies Modulating Trajectory Generators (PMTG). This architecture builds a recurrent control loop by combining a parametric trajectory generator (TG) and a feedback policy network to achieve more robust behaviors. To take advantage of human experts' knowledge but eliminate time-consuming interactive teaching, researchers have investigated a novel architecture, Policies Modulating Trajectory Generators (PMTG), which builds a recurrent control loop by combining a parametric trajectory generator (TG) and a feedback policy network to achieve more robust behaviors using intuitive prior knowledge. In this work, we propose Policies Modulating Finite State Machine (PM-FSM) by replacing TGs with contact-aware finite state machines (FSM), which offer more flexible control of each leg. Compared with the TGs, FSMs offer high-level management on each leg motion generator and enable a flexible state arrangement, which makes the learned behavior less vulnerable to unseen perturbations or challenging terrains. This invention offers an explicit notion of contact events to the policy to negotiate unexpected perturbations. We demonstrated that the proposed architecture could achieve more robust behaviors in various scenarios, such as challenging terrains or external perturbations, on both simulated and real robots. The supplemental video can be found at: https://youtu.be/78cboMqTkJQ.
翻译:深层强化学习(深入RL)已成为发展腿式机器人控制器的有效工具。然而,香草深度RL通常需要大量的培训样本,无法实现稳健的行为。相反,研究人员通过将人类专家的知识,如政策模擬轨迹生成器(PMTG)纳入人类专家的知识,对一个新的政策架构进行了调查。这一架构通过将参数轨迹生成器(TG)和反馈政策网络结合起来,建立经常性的控制环环。为了利用人类专家的知识,消除耗时的交互式教学,研究人员已经调查了一个新的架构,政策模拟轨迹生成器(PMTG)往往需要大量培训样本,而对于实现强健行为来说并不可行。 在这项工作中,我们提出政策模制Finite State machine (PM-FSM), 将TGs替换为触动性精度精度精度精度精度精度的状态机器(FSM), 研究人员对每条腿进行更灵活的控制。 与TGSDG相比, FMSBS 提供了一次高度的常规动作操作, 以显示高清晰的动作, 。