Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements on the Atari100k benchmark over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances. Project page: https://nicklashansen.github.io/xtra.
翻译:强化学习(RL)算法可以直接通过图像观测解决具有挑战性的控制问题,但往往需要数百万环境互动才能做到这一点。最近,基于模型的RL算法通过同时学习世界内部模型,以及用想象的推出来改进政策来补充真实环境互动,大大提高了样本效率。然而,从零开始学习一个有效的世界模型具有挑战性,与严重依赖世界理解和视觉提示来学习新技能的人类形成鲜明对比。在这项工作中,我们研究是否可以利用现代基于模型的RL算法所学的内部模型来更快地解决新的、截然不同的不同任务。我们提出了基于模型的跨塔斯克传输(XTRA)(XTRA)(XTRA),这是一个样本效率高的在线RL框架,对学习的世界模型进行可升级的预培训和微调。通过离线多任务前培训和在线交叉任务微调,我们大大改进了Atari100k基准,而不是从零开始训练的基线;我们提高基于模型的算法效率Zero的平均表现率为23%,有些情况下则高达71%。项目网页: https://nkslashansang.