Legged robots are becoming increasingly powerful and popular in recent years for their potential to bring the mobility of autonomous agents to the next level. This work presents a deep reinforcement learning approach that learns a robust Lidar-based perceptual locomotion policy in a partially observable environment using Proximal Policy Optimisation. Visual perception is critical to actively overcome challenging terrains, and to do so, we propose a novel learning strategy: Dynamic Reward Strategy (DRS), which serves as effective heuristics to learn a versatile gait using a neural network architecture without the need to access the history data. Moreover, in a modified version of the OpenAI gym environment, the proposed work is evaluated with scores over 90% success rate in all tested challenging terrains.
翻译:近些年来,牵起的机器人越来越强大,越来越受欢迎,因为他们有可能将自主代理的流动性提升到下一个水平。 这项工作展示了一种深层强化学习方法,在部分可观测的环境中,利用准ximal政策优化,学习基于利达尔的强力感知运动政策。 视觉感知对于积极克服具有挑战性的地形至关重要,为此,我们提出了一个新颖的学习战略:动态奖励战略(DRS ), 作为使用神经网络结构学习多功能步法的有效惯性,无需访问历史数据。 此外,在经过测试的所有具有挑战性的地形中,在经过测试的OpenAI体操环境中,以超过90%的成功率对拟议工作进行评估。