In this paper, we present a framework for learning quadruped navigation by integrating central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework. Through both exteroceptive and proprioceptive sensing, the agent learns to modulate the intrinsic oscillator setpoints (amplitude and frequency) and coordinate rhythmic behavior among different oscillators to track velocity commands while avoiding collisions with the environment. We compare different neural network architectures (i.e. memory-free and memory-enabled) which learn implicit interoscillator couplings, as well as varying the strength of the explicit coupling weights in the oscillator dynamics equations. We train our policies in simulation and perform a sim-to-real transfer to the Unitree Go1 quadruped, where we observe robust navigation in a variety of scenarios. Our results show that both memory-enabled policy representations and explicit interoscillator couplings are beneficial for a successful sim-to-real transfer for navigation tasks. Video results can be found at https://youtu.be/O_LX1oLZOe0.
翻译:在本文中,我们提出了一个学习四重导航的框架,将中央模式生成器(CPGs),即混合振动器系统,纳入深强化学习(DRL)框架。通过外观感知和自动感知感知感知,代理器学会调整内在振动设置点(感知和频率),并协调不同振动器之间的有节奏行为,以跟踪速度指令,同时避免与环境发生碰撞。我们比较了不同的神经网络结构(即不留记忆和内存启动的),这些结构可以学习隐含的振动器间联动,以及改变振动器动态方程式中明显联动重量的强度。我们训练了模拟政策,并进行模拟性到真实的转换到Unitree Go1四重,我们在其中观察到了各种情景中的稳健导航。我们的结果显示,记忆化的政策表述和明确的间振动器组合都有利于在导航任务上成功进行Sim-toZ转让。在 AL_LMLO1 上可以找到图像结果。