Model-based reinforcement learning methods achieve significant sample efficiency in many tasks, but their performance is often limited by the existence of the model error. To reduce the model error, previous works use a single well-designed network to fit the entire environment dynamics, which treats the environment dynamics as a black box. However, these methods lack to consider the environmental decomposed property that the dynamics may contain multiple sub-dynamics, which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error and boosts the performance of the state-of-the-art MBRL algorithms on various tasks.
翻译:基于模型的强化学习方法在许多任务中取得了显著的样本效率,但其性能往往因模型错误的存在而受到限制。为减少模型错误,以往的工程使用一个设计完善的单一网络来适应整个环境动态,将环境动态作为黑盒处理。然而,这些方法缺乏考虑环境分解属性,即动态可能包含多个次动力学,可以单独建模,从而使我们能够更准确地构建世界模型。在本文中,我们提议环境动态分解(ED2),这是一个以分解方式模拟环境的新颖的世界建模框架。ED2包含两个关键组成部分:次动力学发现(SD2)和动态分解预测(D2P)。SD2在环境中发现亚动力学,然后D2P在次动力学后构建分解的世界模型。ED2可以很容易地与现有的MBRL算法和实验结果相结合,ED2可以显著减少模型错误,并提升MBRL在各种任务上状态的MBRL算法的性。