The ability to adapt to changes in environmental contingencies is an important challenge in reinforcement learning. Indeed, transferring previously acquired knowledge to environments with unseen structural properties can greatly enhance the flexibility and efficiency by which novel optimal policies may be constructed. In this work, we study the problem of transfer learning under changes in the environment dynamics. In this study, we apply causal reasoning in the offline reinforcement learning setting to transfer a learned policy to new environments. Specifically, we use the Decision Transformer (DT) architecture to distill a new policy on the new environment. The DT is trained on data collected by performing policy rollouts on factual and counterfactual simulations from the source environment. We show that this mechanism can bootstrap a successful policy on the target environment while retaining most of the reward.
翻译:适应环境突发事件变化的能力是强化学习的一个重要挑战。 事实上,将先前获得的知识转移到具有不可见结构特性的环境可以大大提高灵活性和效率,从而可以构建新的最佳政策。在这项工作中,我们研究了环境动态变化中学习转移的问题。在这项研究中,我们在离线强化学习设置中应用因果推理,将一项已学习的政策转移到新的环境。具体地说,我们利用决定变换(DT)架构来制定关于新环境的新政策。DT在从源环境中对事实和反事实进行政策模拟所收集的数据方面接受了培训。我们表明,这一机制可以在目标环境中成功实施政策,同时保留大部分奖励。