Learning dynamics models accurately is an important goal for Model-Based Reinforcement Learning (MBRL), but most MBRL methods learn a dense dynamics model which is vulnerable to spurious correlations and therefore generalizes poorly to unseen states. In this paper, we introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL), which first learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action, thus generalizing well to unseen states. A state abstraction can then be derived from the learned dynamics, which not only improves sample efficiency but also applies to a wider range of tasks than existing state abstraction methods. Evaluated on two simulated environments and downstream tasks, both the dynamics model and policies learned by the proposed method generalize well to unseen states and the derived state abstraction improves sample efficiency compared to learning without it.
翻译:学习动态模型是模型强化学习(MBRL)的一个重要目标,但大多数MBRL方法都学习了密集的动态模型,该模型很容易受到虚假的关联性的影响,因此对不可见的状态非常笼统。 在本文中,我们引入了为任务独立状态抽象(CDL)而学习的Causal Dynamics Learning(Causal Slearning)模型,该模型首先学习了在理论上证明的因果动态模型,该模型消除了国家变量与行动之间不必要的依赖性,从而将广度广泛推广到不可见国家。 之后,从所学的动态中可以得出州抽象学,该动态模型不仅提高了抽样效率,而且适用于比现有国家抽象方法更广泛的任务范围。 评估了两个模拟环境和下游任务,即根据拟议方法所学的动态模型和政策,该模型和所学的政策都非常接近不可见状态,而衍生的抽象则提高了抽样效率,而没有进行学习的效率。