DEPCPG 机器人劳动政策</s> (DeepCPG Policies for Robot Locomotion)

Central Pattern Generators (CPGs) form the neural basis of the observed rhythmic behaviors for locomotion in legged animals. The CPG dynamics organized into networks allow the emergence of complex locomotor behaviors. In this work, we take this inspiration for developing walking behaviors in multi-legged robots. We present novel DeepCPG policies that embed CPGs as a layer in a larger neural network and facilitate end-to-end learning of locomotion behaviors in deep reinforcement learning (DRL) setup. We demonstrate the effectiveness of this approach on physics engine-based insectoid robots. We show that, compared to traditional approaches, DeepCPG policies allow sample-efficient end-to-end learning of effective locomotion strategies even in the case of high-dimensional sensor spaces (vision). We scale the DeepCPG policies using a modular robot configuration and multi-agent DRL. Our results suggest that gradual complexification with embedded priors of these policies in a modular fashion could achieve non-trivial sensor and motor integration on a robot platform. These results also indicate the efficacy of bootstrapping more complex intelligent systems from simpler ones based on biological principles. Finally, we present the experimental results for a proof-of-concept insectoid robot system for which DeepCPG learned policies initially using the simulation engine and these were afterwards transferred to real-world robots without any additional fine-tuning.

翻译：中央型式发电机(CPGs) 构成了观察到的对脚部动物动动动的有节奏行为的神经基础。 CPG 动力化成网络, 使得复杂的叶盘行为得以出现。在这项工作中, 我们利用这种灵感来发展多腿机器人的行走行为。我们展示了新型的DeepCPG政策, 将CPG作为一层嵌入一个更大的神经网络, 并促进在深度强化学习( DRL) 设置中从端到端学习定位行为。我们展示了基于物理引擎的昆虫机器人的这一方法的有效性。我们显示, 与传统方法相比, DeepCPG 政策允许出现复杂的叶盘到端学习有效的移动战略, 即使在高脚部传感器空间( 图像) 也是如此。我们用模块机器人配置和多剂 DRL 来放大深层次CPG 政策的规模。我们的结果表明, 以模块化方式嵌入这些政策的前期逐渐复杂化, 可以在机器人平台上实现非三角感应感知和发动机融合。这些结果还表明, 与传统方法相比, Deeptracraft- grecal- supal robal 政策的功效的功效, 也是我们从更复杂的智能系统学习了更复杂的机器人系统, 的模拟的模拟的模拟的模拟系统, 的模拟的模拟系统最终的模拟的模拟的模拟的机器人系统, 和模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟机器人系统, 的模拟的模拟的模拟系统,最终的模拟的模拟的模拟的模拟的模拟的模拟系统, 的模拟系统系统的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟系统的模拟系统的系统的系统的模拟系统的模拟的模拟结果。</s>