In this letter, we present a method for integrating central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework to produce robust and omnidirectional quadruped locomotion. The agent learns to directly modulate the intrinsic oscillator setpoints (amplitude and frequency) and coordinate rhythmic behavior among different oscillators. This approach also allows the use of DRL to explore questions related to neuroscience, namely the role of descending pathways, interoscillator couplings, and sensory feedback in gait generation. We train our policies in simulation and perform a sim-to-real transfer to the Unitree A1 quadruped, where we observe robust behavior to disturbances unseen during training, most notably to a dynamically added 13.75 kg load representing 115% of the nominal quadruped mass. We test several different observation spaces based on proprioceptive sensing and show that our framework is deployable with no domain randomization and very little feedback, where along with the oscillator states, it is possible to provide only contact booleans in the observation space. Video results can be found at https://youtu.be/xqXHLzLsEV4.
翻译:在此信中,我们提出了一个将中央模式生成器(CPGs),即组合振动器系统,纳入深强化学习(DRL)框架,以产生稳健和全向四振四振移动。代理器学会直接调节固有的振动设置点(光度和频率),并协调不同振动器之间的有节奏行为。这个方法还允许DRL探索与神经科学有关的问题,即下游路径、间振荡器连接器和感官回馈的作用。我们在模拟中培训我们的政策,并向Unite A1四振动进行模拟到真实的传输,我们在那里观察到在训练期间无法见的干扰的强势行为,最明显的是占名义四振动质量115%的动态添加13.75公斤负荷。我们测试了基于感知感知感测的若干不同的观测空间,并表明我们的框架可以部署,没有域随机化,而且几乎没有什么反馈,与BloneXVS4的观测结果一起可以提供。