In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.
翻译:在本文中,我们展示了一种世界模型,它学习了因果特征,使用了不轨原则。特别是,我们利用对比的、不受监督的学习来学习不轨因果特征,这些特征在观测的无关部分或风格的增量中造成差异。基于世界模型的强化学习方法独立地优化了代表性学习和政策。因此,由于对代表性学习模块缺乏监督信号,造成天真的反差执行失败。我们提议了一种不轨干预的辅助任务来缓解这一问题。具体地说,我们利用深度预测来明确实施差异性数据增强,并将数据增强作为RGB观测空间的风格干预。我们的设计利用不受监督的代表性学习方法,学习世界模式的无异因因果特征。我们提出的方法大大超越了目前基于模型和无模式的超标强化方法。此外,我们提议的模型在概念学习模块的模拟到真实传输方面优异。最后,我们评估了我们关于深盘控制套件模式的方法,没有在暗的深度上实施,因此只能以默认的方式执行。