Intelligent agents need to generalize from past experience to achieve goals in complex environments. World models facilitate such generalization and allow learning behaviors from imagined outcomes to increase sample-efficiency. While learning world models from image inputs has recently become feasible for some tasks, modeling Atari games accurately enough to derive successful behaviors has remained an open challenge for many years. We introduce DreamerV2, a reinforcement learning agent that learns behaviors purely from predictions in the compact latent space of a powerful world model. The world model uses discrete representations and is trained separately from the policy. DreamerV2 constitutes the first agent that achieves human-level performance on the Atari benchmark of 55 tasks by learning behaviors inside a separately trained world model. With the same computational budget and wall-clock time, Dreamer V2 reaches 200M frames and surpasses the final performance of the top single-GPU agents IQN and Rainbow. DreamerV2 is also applicable to tasks with continuous actions, where it learns an accurate world model of a complex humanoid robot and solves stand-up and walking from only pixel inputs.
翻译:智能分子需要从过去的经验中归纳经验,以便在复杂环境中实现各项目标。 世界模型可以促进这种一般化,并允许从想象的结果中学习学习行为,从而提高样本效率。 虽然从图像输入中学习世界模型最近成为某些任务的可行之处,但模拟阿塔里游戏的准确性能足以得出成功行为,多年来仍是一个公开的挑战。 我们引入了DreaterV2, 这是一种强化学习工具,纯粹从强大世界模型的紧凑潜伏空间的预测中学习行为。 世界模型使用离散的表达方式,并且与政策分开培训。 DreaderV2 是第一个通过在单独培训的世界模型中学习55项任务,从而在阿塔里基准中实现人类层面业绩的代理。 在同一计算预算和钟钟钟中,Dreamer V2 到达了200M框架,超过了顶级的单式GPU代理 IQN 和彩虹的最后性能。 DreamerV2 也适用于持续行动的任务, 在那里学习复杂的人形机器人世界模型的准确性模型,并解决仅从像素输入的立方和行走。