In this paper, we propose a robust controller that achieves natural and stably fast locomotion on a real blind quadruped robot. With only proprioceptive information, the quadruped robot can move at a maximum speed of 10 times its body length, and has the ability to pass through various complex terrains. The controller is trained in the simulation environment by model-free reinforcement learning. In this paper, the proposed loose neighborhood control architecture not only guarantees the learning rate, but also obtains an action network that is easy to transfer to a real quadruped robot. Our research finds that there is a problem of data symmetry loss during training, which leads to unbalanced performance of the learned controller on the left-right symmetric quadruped robot structure, and proposes a mirror-world neural network to solve the performance problem. The learned controller composed of the mirror-world network can make the robot achieve excellent anti-disturbance ability. No specific human knowledge such as a foot trajectory generator are used in the training architecture. The learned controller can coordinate the robot's gait frequency and locomotion speed, and the locomotion pattern is more natural and reasonable than the artificially designed controller. Our controller has excellent anti-disturbance performance, and has good generalization ability to reach locomotion speeds it has never learned and traverse terrains it has never seen before.
翻译:在本文中, 我们提出一个强大的控制器, 可以在一个真正的双盲四重机器人上实现自然的、 快速的快速移动。 由于只有自觉的信息, 四重机器人可以以最大速度移动10倍于其身体长度, 并有能力通过各种复杂地形。 控制器在模拟环境中通过无模型强化学习进行训练。 在本文中, 拟议的松散社区控制架构不仅可以保证学习率, 还可以获得一个容易转换到真正四重机器人的行动网络。 我们的研究发现, 在培训期间存在数据对称损失的问题, 这使得学习的控制器在左右对称四重机器人结构上的表现不平衡, 并且可以提出一个镜世界神经网络来解决性能问题。 由镜世界网络组成的熟知的控制器可以使机器人获得极好的反暴动能力。 在培训架构中, 没有使用像脚轨发电机这样的特定人类知识。 学习的控制器可以协调机器人的听觉频率和移动速度, 并且它从不具有高水平的机械化速度, 并且它从不具有高水平的自然和高压的动作, 。