Legged robots are physically capable of traversing a wide range of challenging environments, but designing controllers that are sufficiently robust to handle this diversity has been a long-standing challenge in robotics. Reinforcement learning presents an appealing approach for automating the controller design process and has been able to produce remarkably robust controllers when trained in a suitable range of environments. However, it is difficult to predict all likely conditions the robot will encounter during deployment and enumerate them at training-time. What if instead of training controllers that are robust enough to handle any eventuality, we enable the robot to continually learn in any setting it finds itself in? This kind of real-world reinforcement learning poses a number of challenges, including efficiency, safety, and autonomy. To address these challenges, we propose a practical robot reinforcement learning system for fine-tuning locomotion policies in the real world. We demonstrate that a modest amount of real-world training can substantially improve performance during deployment, and this enables a real A1 quadrupedal robot to autonomously fine-tune multiple locomotion skills in a range of environments, including an outdoor lawn and a variety of indoor terrains.
翻译:扶强学习为控制器设计过程自动化提供了一种吸引人的方法,并且当在适当环境范围内接受培训时,能够产生非常强大的控制器。然而,很难预测机器人在部署期间将遇到的所有可能的条件,并在培训时列出这些条件。如果不是训练足够强大的控制器来应付任何可能发生的情况,我们就能让机器人在任何环境中不断学习?这种真实世界强化学习带来了许多挑战,包括效率、安全和自主。为了应对这些挑战,我们提议建立一个实用的机器人强化学习系统,以在现实世界中微调移动政策。我们证明,少量实际世界培训能够大大改善部署期间的性能,使真正的A1四倍机器人能够在一系列环境中,包括户外草坪和各种室内地形上,进行自主的微调多种移动技能。