We study the use of model-based reinforcement learning methods, in particular, world models for continual reinforcement learning. In continual reinforcement learning, an agent is required to solve one task and then another sequentially while retaining performance and preventing forgetting on past tasks. World models offer a task-agnostic solution: they do not require knowledge of task changes. World models are a straight-forward baseline for continual reinforcement learning for three main reasons. Firstly, forgetting in the world model is prevented by persisting existing experience replay buffers across tasks, experience from previous tasks is replayed for learning the world model. Secondly, they are sample efficient. Thirdly and finally, they offer a task-agnostic exploration strategy through the uncertainty in the trajectories generated by the world model. We show that world models are a simple and effective continual reinforcement learning baseline. We study their effectiveness on Minigrid and Minihack continual reinforcement learning benchmarks and show that it outperforms state of the art task-agnostic continual reinforcement learning methods.
翻译:我们研究使用基于模型的强化学习方法,特别是世界不断强化学习模式。在持续强化学习过程中,需要一种代理机构在保持业绩和防止忘记过去任务的同时,按顺序完成一项任务,然后是另一项任务。世界模型提供了一个任务不可知的解决方案:它们不需要对任务变化的了解。世界模型是持续强化学习的一个直线前进基线,主要有三个原因。首先,世界模型的忘却因持续的现有经验在各项任务之间重新发挥缓冲作用而受阻,以往任务的经验被重新用来学习世界模型。第二,它们具有抽样效率。第三,最后,它们通过世界模型产生的轨迹的不确定性提供一项任务不可知性探索战略。我们表明,世界模型是一个简单而有效的持续强化学习基线。我们研究了其在Minigrid和Minihack持续强化学习基准上的有效性,并表明它超过了艺术任务不可知的持续强化学习方法。