DayDreamer:世界物理机器人学习模式 (DayDreamer: World Models for Physical Robot Learning)

To solve tasks in complex environments, robots need to learn from experience. Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn, limiting its deployment in the physical world. As a consequence, many advances in robot learning rely on simulators. On the other hand, learning inside of simulators fails to capture the complexity of the real world, is prone to simulator inaccuracies, and the resulting behaviors do not adapt to changes in the world. The Dreamer algorithm has recently shown great promise for learning from small amounts of interaction by planning within a learned world model, outperforming pure reinforcement learning in video games. Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment. However, it is unknown whether Dreamer can facilitate faster learning on physical robots. In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators. Dreamer trains a quadruped robot to roll off its back, stand up, and walk from scratch and without resets in only 1 hour. We then push the robot and find that Dreamer adapts within 10 minutes to withstand perturbations or quickly roll over and stand back up. On two different robotic arms, Dreamer learns to pick and place multiple objects directly from camera images and sparse rewards, approaching human performance. On a wheeled robot, Dreamer learns to navigate to a goal position purely from camera images, automatically resolving ambiguity about the robot orientation. Using the same hyperparameters across all experiments, we find that Dreamer is capable of online learning in the real world, establishing a strong baseline. We release our infrastructure for future applications of world models to robot learning.

翻译：要在复杂环境中解决问题,机器人需要从经验中学习。深层强化学习是一种常见的机器人学习方法,但需要大量试验和错误才能学习,从而限制其在物理世界中的部署。因此,机器人学习的许多进步依赖于模拟器。另一方面,模拟器内部的学习无法捕捉真实世界的复杂性,容易模拟机器人的不准确性,由此产生的行为无法适应世界的变化。深层强化学习是一种常见的机器人学习方法,它是一种常见的机器人学习方法,它通过规划一个学习的世界型号中的小型互动应用来学习,但需要大量试验和错误才能学习,从而限制其在物理世界中的应用。因此,在模拟器内部学习模拟器无法捕捉到真实世界的复杂复杂性,在不模拟器中,我们把“梦想”应用到4个机器人在现实世界的在线和直接学习,在模拟器中,在模拟器中将一个振荡的机器人培养到一个未来定位,在纯正轨上站,在视觉上行走,然后在视觉上,在1小时内将一个不同的目标推到一个不同的轨道上。