学习使用大规模平行深层强化学习,在分钟内行走 (Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning)

In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion.

翻译：在这项工作中,我们提出并研究一个培训设置,通过在单一工作站GPU上使用大规模平行法,为现实世界机器人任务迅速制定政策。我们分析和讨论大规模平行制度中不同培训算法组成部分对最后政策业绩和培训时间的影响。此外,我们还提出了一套新颖的游戏激励课程,非常适合同时用数千个模拟机器人进行培训。我们通过训练四重机器人Anymal在具有挑战性的地形上行走来评估这一方法。平行方法允许在4分钟以内对平坦地形实施培训政策,而在不均匀的地形则在20分钟内对平坦地形实施培训。这代表了与以往工作相比多重规模的加速。最后,我们把政策转给真正的机器人,以验证这一方法。我们开发了我们的培训代码,以帮助加速在学习的腿动地领域的进一步研究。