Deep reinforcement learning (deep RL) has emerged as an effective tool for developing controllers for legged robots. However, a simple neural network representation is known for its poor extrapolation ability, making the learned behavior vulnerable to unseen perturbations or challenging terrains. Therefore, researchers have investigated a novel architecture, Policies Modulating Trajectory Generators (PMTG), which combines trajectory generators (TG) and feedback control signals to achieve more robust behaviors. In this work, we propose to extend the PMTG framework with a finite state machine PMTG by replacing simple TGs with asynchronous finite state machines (Async FSMs). This invention offers an explicit notion of contact events to the policy to negotiate unexpected perturbations. We demonstrated that the proposed architecture could achieve more robust behaviors in various scenarios, such as challenging terrains or external perturbations, on both simulated and real robots. The supplemental video can be found at: http://youtu.be/XUiTSZaM8f0.
翻译:深度强化学习( deep RL) 已成为开发腿式机器人控制器的有效工具。 然而,一个简单的神经网络代表以其极差的外推能力而闻名于世,这使得学习到的行为容易受到不可见的扰动或具有挑战性地形的影响。 因此,研究人员已经调查了一个新的结构,即“政策变换轨迹生成器(PMTG)”和反馈控制信号(PMTG),该结构将轨迹生成器(TG)和反馈控制信号结合起来,以实现更稳健的行为。 在这项工作中,我们提议扩大PMTG框架,以有限的国家机器PMTG(PMTG)取代简单的TGs(PMTG ), 代之以无同步的有限状态状态机器(Async FSMSM) 。 这一发明为谈判意外扰动的政策提供了一个明确的接触事件概念。 我们证明,拟议的架构可以在各种情景中实现更稳健的行为,例如挑战性地形或外部扰动的模拟和真实机器人。 补充视频可以在http://yotu.be/XUTITSZ8f0上找到。