Central Pattern Generators (CPGs) have several properties desirable for locomotion: they generate smooth trajectories, are robust to perturbations and are simple to implement. Although conceptually promising, we argue that the full potential of CPGs has so far been limited by insufficient sensory-feedback information. This paper proposes a new methodology that allows tuning CPG controllers through gradient-based optimization in a Reinforcement Learning (RL) setting. To the best of our knowledge, this is the first time CPGs have been trained in conjunction with a MultilayerPerceptron (MLP) network in a Deep-RL context. In particular, we show how CPGs can directly be integrated as the Actor in an Actor-Critic formulation. Additionally, we demonstrate how this change permits us to integrate highly non-linear feedback directly from sensory perception to reshape the oscillators' dynamics. Our results on a locomotion task using a single-leg hopper demonstrate that explicitly using the CPG as the Actor rather than as part of the environment results in a significant increase in the reward gained over time (6x more) compared with previous approaches. Furthermore, we show that our method without feedback reproduces results similar to prior work with feedback. Finally, we demonstrate how our closed-loop CPG progressively improves the hopping behaviour for longer training epochs relying only on basic reward functions.
翻译:中央型式发电机(CPGs)有几种适合移动的属性:它们产生光滑的轨迹,是稳健且易于执行的。虽然在概念上很有希望,但我们认为,至今为止,CPG的全部潜力因感官反馈信息不足而受到限制。本文提出一种新的方法,允许在强化学习(RL)环境中通过基于梯度的优化调整CPG控制器。根据我们的最佳知识,这是首次在深RL背景下与多层-Perceptron(MLP)网络一起培训CPG。特别是,我们展示了CPGs如何直接作为行为者融入到一种Acor-Critic的配方中。此外,我们展示了这种改变如何通过感官感知直接纳入非线性反馈以改变振动器的动态。我们仅使用单腿沙子完成的 Locomotion任务的结果显示,在深度使用CPG作为行为者而不是环境的一部分,在深度的深度的RL背景下,我们展示了C-C-C-C-C-C-C-C-C-CR)级反馈中如何逐步地展示了我们以往的升级方法。