Despite the recent success of deep reinforcement learning (RL), domain adaptation remains an open problem. Although the generalization ability of RL agents is critical for the real-world applicability of Deep RL, zero-shot policy transfer is still a challenging problem since even minor visual changes could make the trained agent completely fail in the new task. To address this issue, we propose a two-stage RL agent that first learns a latent unified state representation (LUSR) which is consistent across multiple domains in the first stage, and then do RL training in one source domain based on LUSR in the second stage. The cross-domain consistency of LUSR allows the policy acquired from the source domain to generalize to other target domains without extra training. We first demonstrate our approach in variants of CarRacing games with customized manipulations, and then verify it in CARLA, an autonomous driving simulator with more complex and realistic visual observations. Our results show that this approach can achieve state-of-the-art domain adaptation performance in related RL tasks and outperforms prior approaches based on latent-representation based RL and image-to-image translation.
翻译:尽管最近深入强化学习(RL)取得了成功,但领域适应仍然是一个尚未解决的问题。虽然RL代理商的普及能力对于Deep RL的真实适用性至关重要,但零点政策转移仍然是一个具有挑战性的问题,因为即使是微小的视觉变化也可能使受过训练的代理商在新任务中完全失败。为了解决这个问题,我们提议一个两阶段的RL代理商,该代理商首先学习一个潜伏的统一国家代表器(LUSR),该代理商在第一阶段在多个领域之间是一致的,然后根据LUSR在第二阶段在一个源域进行RL培训。LUSR的跨域一致性使得从源域获得的政策能够在不经过额外培训的情况下向其它目标领域推广。我们首先展示了我们采用定制操作的Carcing游戏的变式,然后在CARLA中进行核查,这是一个自动驱动模拟器,其视觉观察更为复杂和现实。我们的结果表明,这一方法可以在相关的RL任务中实现最新域适应性工作,并超越了基于基于潜代表制RL和图像到图像转换的先前方法。