Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge. Unfortunately, due to sample inefficiency, deep RL applications have primarily focused on simulated environments. In this work, we demonstrate that the recent advancements in machine learning algorithms and libraries combined with a carefully tuned robot controller lead to learning quadruped locomotion in only 20 minutes in the real world. We evaluate our approach on several indoor and outdoor terrains which are known to be challenging for classical model-based controllers. We observe the robot to be able to learn walking gait consistently on all of these terrains. Finally, we evaluate our design decisions in a simulated environment.
翻译:深入强化学习是在不需要域知识的无控制环境中学习政策的一个很有希望的方法。 不幸的是,由于抽样缺乏效率,深入的RL应用主要侧重于模拟环境。在这项工作中,我们证明机器学习算法和图书馆最近的进展,加上一个经过仔细调整的机器人控制器,导致在现实世界中仅20分钟就学会了四倍的移动。我们评估了我们在若干室内和室外地形上的做法,据知这些地形对古典模型控制器具有挑战性。我们观察机器人,以便能够在所有这些地形上一致地学习行走。最后,我们评估了模拟环境中的设计决定。